Introduction

In the dynamic world of business, companies are always looking for innovative solutions to enhance competitiveness, drive down costs, and augment profits while embracing sustainability. Enter Artificial Intelligence (AI), a transformative tool that goes beyond mere automation, particularly with the advent of generative AI. This blog aims to explore the deeper layers of how companies can not only leverage AI to cut costs and boost profits but also contribute to building a sustainable future.

1. Automation

At its core, AI’s role in automation extends far beyond streamlining routine processes. Integrating AI into automation processes enables a more nuanced understanding of data, allowing for predictive analysis and proactive decision-making. This, in turn, minimizes downtimes and optimizes resource allocation. Moreover, AI-driven automation facilitates the identification of inefficiencies and bottlenecks that may go unnoticed in traditional systems, enabling companies to fine-tune their processes for maximum efficiency. In terms of cost reduction, AI excels in repetitive and rule-based tasks, reducing the need for manual labor and minimizing errors. Beyond the financial benefits, incorporating AI into automation aligns with sustainability goals by optimizing energy consumption, waste reduction, and overall resource management.

2. Predictive Analytics

AI’s real-time data processing capabilities empower companies with predictive analytics, offering a glimpse into the future of their operations. By analyzing historical data, AI forecasts market trends, customer behaviors, and potential risks. Consider a retail giant utilizing AI algorithms to predict customer preferences. This not only optimizes inventory management but also contributes to waste reduction and sustainability efforts.

By predicting future market trends, customer behavior, and operational needs, businesses can optimize their resource allocation, streamline operations, and minimize waste. This not only trims costs but also enhances profitability by aligning products and services with market demands. Moreover, predictive analytics enables companies to anticipate equipment failures, preventing costly downtime and contributing to a more sustainable operation. Harnessing the power of AI in predictive analytics is not just about crunching numbers; it’s about gaining insights that empower strategic decision-making, fostering a resilient and forward-thinking business model.

3. Personalization at Scale

Generative AI enables hyper-personalization by analyzing vast datasets to understand individual preferences, behaviors, and trends. Companies can utilize advanced algorithms to tailor products or services in real-time, offering a personalized experience that resonates with each customer. This not only fosters customer satisfaction but also drives increased sales and brand loyalty. On the cost front, AI streamlines operations through predictive analytics, optimizing supply chain management, and automating routine tasks. This not only reduces operational expenses but also enhances efficiency. In terms of sustainability, AI aids in resource optimization, minimizing waste and energy consumption. By understanding customer preferences at an intricate level, companies can produce and deliver exactly what is needed, mitigating excess production and waste.

4. Supply Chain Optimization

AI’s pivotal role in optimizing supply chains is revolutionizing sustainability efforts. Generative AI aids in demand forecasting, route optimization, and inventory management, minimizing waste and reducing the carbon footprint. Retail giants like Walmart have successfully implemented AI-powered supply chain solutions, resulting in substantial cost savings and environmental impact reduction.

AI can optimize various facets of the supply chain, from demand forecasting to inventory management. By analyzing historical data and real-time information, AI algorithms can make accurate predictions, preventing overstock or stockouts, thereby minimizing waste and maximizing efficiency. Additionally, AI-driven automation in logistics can streamline operations, cutting down on manual errors and reducing labor costs. Route optimization algorithms can optimize transportation, not only saving fuel and time but also curbing the carbon footprint. Predictive maintenance powered by AI ensures that equipment is serviced proactively, preventing costly breakdowns. Overall, the integration of AI into supply chain processes empowers companies to make data-driven decisions, fostering agility and resilience, ultimately translating into reduced costs, increased profits, and a more sustainable business model.

5. Predictive Maintenance

Generative AI’s impact extends to equipment maintenance, transforming the game by predicting machinery failures. Analyzing data from sensors and historical performance, AI algorithms forecast potential breakdowns, enabling proactive maintenance scheduling. This not only minimizes downtime but also significantly reduces overall maintenance costs, enhancing operational efficiency.

Picture this: instead of waiting for equipment to break down and incurring hefty repair costs, AI algorithms analyze historical data, sensor inputs, and various parameters to predict when machinery is likely to fail. This foresight enables businesses to schedule maintenance precisely when needed, minimizing downtime and maximizing productivity. This involves not just reacting to issues but proactively preventing them. By harnessing AI for predictive maintenance, companies can extend the lifespan of equipment, optimize resource allocation, and, ultimately, boost their bottom line. Moreover, reducing unplanned downtime inherently aligns with sustainability goals, as it cuts down on unnecessary resource consumption and waste associated with emergency repairs.

6. Fraud Detection

The ability of AI to detect patterns and anomalies proves invaluable in combatting fraud. Financial institutions, for instance, deploy generative AI to analyze transaction patterns in real-time, identifying potentially fraudulent activities. This not only safeguards profits but also bolsters the company’s reputation by ensuring a secure environment for customers.

AI systems can analyze vast datasets with unprecedented speed and accuracy, identifying intricate patterns and anomalies that might escape human detection. By deploying advanced machine learning algorithms, companies can create dynamic models that adapt to emerging fraud trends, ensuring a proactive approach rather than a reactive one. This not only minimizes financial losses but also reduces the need for resource-intensive manual reviews. Additionally, AI-driven fraud detection enhances customer trust by swiftly addressing security concerns. By curbing fraud, companies not only protect their bottom line but also contribute to sustainability by fostering a more secure and resilient business environment. It’s a win-win scenario where technology not only safeguards financial interests but aligns with the broader ethos of responsible and enduring business practices.

Conclusion

In conclusion, the integration of AI, especially generative AI, into business operations unveils many opportunities for companies seeking to reduce costs, increase profits, and champion sustainability. From the foundational efficiency of automation to the predictive prowess of analytics, and the personalized touch of generative AI, businesses can strategically utilize these tools for transformative outcomes. Supply chain optimization, predictive maintenance, and content creation further amplify the impact, showcasing the diverse applications of AI.

However, as organizations embark on this AI journey, ethical considerations and environmental consciousness must not be overlooked. Striking a balance between innovation and responsibility is paramount for sustained success. The future belongs to those companies that not only leverage AI for operational excellence but also actively contribute to creating a sustainable and equitable business landscape.

Introduction

Lately, there has been a viral buzz surrounding the term “generative AI.” It’s hard to scroll through social media without bumping into these mind-blowing, AI-generated hyper-realistic images and videos in various genres. These AI creations not only produce captivating visuals but also play a significant role in facilitating business growth, leaving us in awe.

While AI has been an integral part of our lives for quite some time, the current surge in creativity and complexity displayed in these generative creations makes it challenging when delving deeper into its workings.

If you’re an aspiring data analyst, machine learning engineer, or other professional who wishes to understand the basics of AI, this guide is for you. Let’s explore the different evolutions of artificial intelligence and the science behind it in simpler terms, and we’ll also delve into the top service providers of AI and how businesses leverage them in today’s landscape.

What is Artificial Intelligence?

Artificial Intelligence refers to the capability of machines to imitate human intelligence. This isn’t about robots replacing humans; rather, it’s the quest to make machines smart, enabling them to learn, reason, and solve problems autonomously.

AI empowers machines to acquire knowledge, adapt to changes, and independently make decisions. It’s like teaching a computer to think and act like a human.

Machine Learning

AI, or artificial intelligence, involves a crucial element known as machine learning (ML). In simpler terms, machine learning is akin to training computers to improve at tasks without providing detailed instructions. Machines utilize data to learn and enhance their performance without explicit programming. ML, a subset of AI, concentrates on creating algorithms for computers to learn from data. Instead of explicit programming, these systems use statistical techniques to continually improve their performance over time.

Prominent Applications of ML include:

Time Series Forecasting: ML techniques analyze historical time series data to project future values or trends, applicable in domains like sales forecasting, stock market prediction, energy demand forecasting, and weather forecasting.

Credit Scoring: ML models predict creditworthiness based on historical data, enabling lenders to evaluate credit risk and make well-informed decisions regarding loan approvals and interest rates.

Text Classification: ML models categorize text documents into predefined categories or sentiments, with applications such as spam filtering, sentiment analysis, topic classification, and content categorization.

Recommender Systems: ML algorithms are widely utilized in recommender systems to furnish personalized recommendations. These systems learn user preferences from historical data, suggesting relevant products, movies, music, or content.

While scaling a machine learning model to a larger dataset may compromise accuracy, another notable drawback is the manual determination of relevant features by humans, based on business knowledge and statistical analysis. Additionally, ML algorithms face challenges when handling intricate tasks involving high-dimensional data or complex patterns. These limitations spurred the development of Deep Learning (DL) as a distinct branch.

Deep Learning

Taking ML to the next level, Deep Learning (DL) involves artificial neural networks inspired by the human brain, mimicking how our brains work. Employing deep neural networks with multiple layers, DL grasps hierarchical data representations, automating the extraction of relevant features and eliminating the need for manual feature engineering. DL excels at handling complex tasks and large datasets efficiently, achieving remarkable success in areas like computer vision, natural language processing, and speech recognition, despite its complexity and challenges in interpretation.

Common Applications of Deep Learning:

  • Autonomous Vehicles: DL is essential for self-driving cars, using deep neural networks for tasks like object detection, lane detection, and pedestrian tracking, allowing vehicles to understand and react to their surroundings.
  • Facial Recognition: DL is used in training neural networks to detect and identify human faces, enabling applications such as biometric authentication, surveillance systems, and personalized user experiences.
  • Precision Agriculture: Deep learning models analyze data from various sources like satellite imagery and sensors for crop management, disease detection, irrigation scheduling, and yield prediction, leading to more efficient and sustainable farming practices.

However, working with deep learning involves handling large datasets that require constant annotation, a process that can be time-consuming and expensive, particularly when done manually. Additionally, DL models lack interpretability, making it challenging to modify or understand their internal workings. Moreover, there are concerns about their robustness and security in real-world applications due to vulnerabilities exploited by adversarial attacks.

To address these challenges, Generative AI has emerged as a specific area within deep learning.

Generative AI

Now, let’s discuss Generative AI, the latest innovation in the field. Instead of just identifying patterns, generative AI goes a step further by actually producing new content. Unlike just recognizing patterns, generative AI creates things. It aims to produce content that closely resembles what humans might create

A notable example is Generative Adversarial Networks (GANs), which use advanced neural networks to make realistic content such as images, text, and music. Think of it as the creative aspect of AI. A prime example is deepfakes, where AI can generate hyper realistic videos by modifying and combining existing footage. It’s both impressive and a bit eerie.

Generative AI finds applications in various areas:

Image Generation: This involves the model learning from a large set of images and creating new, unique images based on its training data. The tool can generate imaginative images based on prompts like human intelligence.

  • Video Synthesis: Generative models can generate new content by learning from existing videos. This includes tasks like video prediction, where the model creates future frames from a sequence of input frames, and video synthesis, which involves generating entirely new videos. Video synthesis is useful in entertainment, special effects, and video game development.
  • Social Media Content Generation: Generative AI can automate content creation for social media platforms. By training models on extensive social media data, such as images and text, these models can produce engaging and personalized posts, captions, and visuals. The generated content is tailored to specific user preferences and current trends.

In a nutshell, AI is the big brain, Machine Learning is its learning process, Deep Learning is the intricate wiring, and Generative AI is the creative spark.

From spam filters to face recognition and deep fakes, these technologies are shaping our digital world. It’s not just about making things smart; it’s about making them smart in a way that feels almost, well, human.

Top Companies Leveraging AI in their Business:

As AI continues to advance and assert its influence in the business realm, an increasing number of companies are harnessing its capabilities to secure a competitive edge. Below are instances of businesses utilizing AI systems to optimize their operations:

Amazon: The renowned e-commerce retailer uses AI for diverse functions such as product recommendations, warehouse automation, and customer service. Amazon’s AI algorithms scrutinize customer data to furnish personalized product suggestions, while AI-powered robots in its warehouses enhance the efficiency of order fulfillment processes.

Netflix: This streaming service leverages AI to analyze user data and offer personalized content recommendations. By comprehending user preferences and viewing patterns, Netflix personalizes the viewing experience, ultimately boosting user engagement and satisfaction.

IBM: The multinational technology company utilizes its AI platform, Watson, across various sectors for tasks like data analysis, decision-making, and customer service. Watson adeptly analyzes extensive volumes of both structured and unstructured data, enabling businesses to obtain valuable insights and make more informed decisions.

Google: The prominent search engine giant integrates AI for search optimization, language translation, and advertising. Google’s AI algorithms possess the capability to comprehend and process natural language queries, deliver more precise search results, and furnish personalized advertising based on user data.

Conclusion

In conclusion, the rise of generative AI has undeniably captivated our imagination, showcasing its potential not only in creative endeavors but also as a driving force behind business growth.

As we witness the impressive applications of AI in companies like Amazon, Netflix, IBM, and Google, it becomes evident that AI’s transformative influence on various industries is profound.

Looking ahead, the question arises: What might follow generative AI? Could it be interactive AI? As businesses continue to embrace and leverage AI capabilities, the evolution of this technology holds the promise of more interactive and human-like experiences.

SAP Business Objects Data Services delivers a single enterprise-class solution for data integration, data quality, data profiling, and text data processing that allows you to integrate, transform, improve, and deliver trusted data to critical business processes. With SAP BusinessObjects Data Services, IT organizations can maximize operational efficiency with a single solution to improve data quality and gain access to heterogeneous sources and applications.

The important functions of SAP BODS are:

  • Extraction Transformation and Loading (ETL): ETL can extract the data from any database or table and load the data into any other database or table.
  • Data warehousing: Database is designed and developed in a particular format for data analysis and reporting. It can be implemented by using data from databases or data sources.
  • Data Migration: Data migration is the process of moving data from one place to another place. It is a subset of ETL where the data is to be relocated from one software system or from one database to another database.
  • Business Intelligence: It analyses the data of an organization effectively and helps in improving business performance.

Logging into the SAP BODS Designer:

You must have access to a local repository to log into the software. Typically, you create a repository during installation. However, you can create a repository at any time using the Repository Manager and configure access rights within the Central Management Server.

  1. Enter your user credentials for the CMS.
  1. Click Log on. The software attempts to connect to the CMS using the specified information. When you log in successfully, the list of local repositories that are available to you is displayed.
  2. Select the repository you want to use.
  3. Click OK to log in using the selected repository

BODS – Object Hierarchy

Designer window

The Designer user interface consists of a single application window and several embedded supporting windows.

  1. Project area: Contains the current project (and the job(s) and other objects within it) available to you at a given time. In the software, all entities you create, modify, or work with are objects.
  2. Workspace: The area of the application window in which you define, display, and modify objects.
  3. Local object library: Provides access to local repository objects including built-in system objects, such as transforms, and the objects you build and save, such as jobs and data flows.
  4. Tool palette: Buttons on the tool palette enable you to add new objects to the workspace.

Creating Datastores in DS Designer

To develop Data Migration work, you first need to create data stores for the source and the target system.

Step 1:

Click Create Data Stores.

A new window will open.

Step 2:

Enter the Datastore name, Datastore type, and database type as shown below. You can select a different database as the source system as shown in the screenshot below. And also, we need to mention the credentials for that particular database.

Step 3:

Click OK and the Datastore will be added to the Local object library list. If you expand Datastore, it does not show any table.

Data Migration Flow

Step 1:

Create a new project. Click the option, Create Project. Enter the Project Name and click Create. It will be added to the Project Area.

Step 2:

Right click on the Project name and create a new batch job/real time job.

Step 3:

Enter the name of the job and press Enter. You have to add Workflow and Data flow to this. Select a workflow and click the work area to add to the job. Enter the name of the workflow and double-click it to add to the Project area.

Step 4:

In a similar way, select the Data flow and bring it to the Project area. Enter the name of the data flow and double-click to add it under the new project.

Step 5:

Now drag the source table under datastore to the Work area. Now you can drag the target table with a similar data type to the work area or you can create a new template table.

To create a new template table right click the source table, Add New → Template Table. Or we can select it from the tool palette: Click the template table icon and drag it inside a data flow to place the template table in the workspace.

Step 6:

Drag the Query Transform to the workspace. Drag and connect them using the line from the source table to query transform and query transform to the target table. Click the Save All option at the top of the project menu.

Step 7:

Click on the Query transform and map the source Schema In columns that you want to include in the target table by dragging.

QueryTransform:
  • Query Transform is similar to a SQL SELECT statement.
  • It can perform the following operations-
    • Choose (filter) the data to extract from sources
    • Join data from multiple sources
    • Map columns from input to output schemas
    • Perform transformations and functions on the data
    • Add new columns, nested schemas, and function results to the output schema
    • Assign primary keys to output columns
    • Different functions can be performed using the query transform like LOOKUP, AGGREGATE, CONVERSIONS, etc.
Step 8:

Click the Save All option at the top of the project menu. Now you can schedule the job using Data Service Management Console or you can manually execute it by right clicking on the Job name and execute.

Once the job execution is complete, the data is transferred from the source to target databases based on the conditions that we have given in the query transform.

In conclusion, SAP Business Object Data Services (BODS) is a GUI tool which allows you to create and monitor jobs which take data from various types of sources and perform some complex transformation on the data as per the business requirement and then will load the data to a target which again can be of any type (i.e., SAP application, flat file, any database).

CloudIQ Attains New Microsoft Solutions Partner Designations

CloudIQ is proud to announce that we have attained three Microsoft Solutions Partner designations under the new Microsoft Cloud Partner Program – Azure Infrastructure, Data & AI, and Digital & App Innovation. The new partner program replaces Microsoft Silver and Gold competencies with new Solutions Partner Designations.

Microsoft Solutions Partner Designations

For each of the six Solutions Partner Designations – Infrastructure, Data & AI, Digital & App Innovation, Modern Work, Security, and Business Applications under the new Microsoft Cloud Partner Program (MCPP) partners must meet the requirements in three different categories which are Performance, Skilling, and Customer Success.

And we are happy to share that we have attained three of the six solution partner designations.

Microsoft Specializations

On top of the Solution Partner Designations, Microsoft also has advanced specialization programs that help demonstrate advanced technical expertise.

CloudIQ has earned advanced specialization in the Modernization of Web Applications to Azure and Kubernetes on Microsoft Azure.

Attaining the Microsoft Solution Partner Designations and Microsoft Advanced Specializations goes to show CloudIQ’s expertise and commitment to delivering best-in-class solutions for customers in any scenario and every industry.

We leverage our deep industry expertise to help businesses envision new products, create innovative business models, and deliver the next level of customer experiences by leveraging the cloud. From standalone cloud projects to enterprise-wide cloud architecture design you can rely on our cloud engineering expertise.

Get in touch with us to learn more.

Flutter is an open-source UI software development kit created by Google. It is used to develop cross platform applications for Android, iOS, Linux, macOS, Windows, and the web from a single codebase. Flutter apps are written in the Dart language.  Dart compiles to native machine code and hence it is optimized and has high performance.

Flutter is inspired by React Native, but with a few key differences: Flutter supports Hot Reload, which allows developers to make changes to production code without performing an app restart; Flutter uses the same rendering engine as Android, while React Native has historically used its own custom renderer. Flutter uses JavaScript, while React Native uses its own language – both have their strengths and weaknesses. Some may find Flutter easier to learn or be more familiar with, while others may prefer React Native.

Creating a new flutter app

After installing flutter on your machine, you can create a flutter project by using flutter create command.

We can also create the project using IDEs like Visual Studio Code or Android Studio.

The main.dart file in lib folder is where we build our app.

We can run the sample app by on an android emulator created using Android studio.  The sample app displays the number of times we have pressed the + symbol.

Widgets

Flutter has a unique architecture that makes it easy to develop cross-platform mobile apps. Its architecture is built around a widget tree. This means that all the widgets and components are arranged in a tree structure. In Flutter, you can create your own widgets and reuse them in any project.

Material widgets implements the Material design language for iOS, Android, and web.  Cupertino widgets implements the current iOS design language based on Apple’s Human Interface Guidelines.  We mostly use the Material widgets in our code.

Some common widgets are:

1. Scaffold: Implements the basic material design visual layout structure.  will occupy its entire window or device screen.

2. AppBar: AppBar is usually the topmost component of the app, it contains the toolbar and some other common action buttons.

All remaining widgets in a Scaffold other than AppBar are usually defined in the ‘body’ property of the Scaffold.

3. Text: Used to display formatted text in the app.

4. Column: A widget that displays its children in a vertical array.

5. Row: A widget that displays it’s children in the horizontal direction.

The tree of widgets displayed in the sample app is given below:

Stateless and Stateful widgets

The widgets whose state cannot be altered once they are built are called stateless widgets.  Below is the basic structure of a stateless widget. Stateless widget overrides the build () method and returns a widget.

import 'package:flutter/material.dart';
 
class MyApp extends StatelessWidget {
  @override
  Widget build(BuildContext context) {
    return Container();
  }
}

The widgets whose state can be altered once they are built are called stateful Widgets.  Below is the basic structure of a stateful widget. Stateful widget overrides the createState() and returns a State. It is used when the UI can change dynamically.

import 'package:flutter/material.dart';
 
class MyApp extends StatefulWidget {
 
  @override
  // ignore: library_private_types_in_public_api
  _MyAppState createState() => _MyAppState();
}
 
class _MyAppState extends State<MyApp> {
  @override
  Widget build(BuildContext context) {
    return Container();
  }
}

Whenever we change the values of properties denoting the state, we must wrap it by a call to setState function to inform flutter to rebuild the widget and display it.

setState(() {
      _counter++;
    });

Hot Reload

Flutter allows hot reloading during development.  Hot reloading allows us to keep the app running and to inject new versions of the files that we edited at runtime.  This way, we don’t lose any of our state which is especially useful if we are making the UI changes.  For example, in our sample code if we change the primarySwatch to Colors.red, the app color changes from blue to red, but the counter still shows 1 and doesn’t get reset to 0.

In conclusion, Flutter is a framework that is built on the Dart programming language and can be used to create native apps for Android and iOS. Flutter uses a widget-based architecture, and so you’ll be able to create apps that look and feel like the ones that you’ve seen on the Apple and Android stores.

When systems evolve the need to migrate database does arise based on the data we are dealing with. In one of our projects, we had a scenario where non-transactional data had to be accessed frequently with low latency across regions. Azure Cosmos DB is Microsoft’s fast NoSQL database and the first globally distributed database service in the market today to offer comprehensive service level agreements encompassing throughput, latency, availability, and consistency.

So, the choice was clear for us to move the data from PostgreSQL to CosmosDB. In PostgreSQL the data format is flat (i.e., in the form of tables and columns), while CosmosDB supports flexible schemas and hierarchical data, and thus it is well suited for storing catalog data. Provides. JSON format supported by Cosmos DB is an effective format that is very lightweight.

For the data migration from PostgreSQL to CosmosDB we chose Azure Data Factory. Azure Data Factory helps you integrate, perform transformations, and visualize all your data with ease. Azure Data Factory is easy to use, cost-effective, a fully serverless cloud service that accelerates data transformation with code-free data flows.

Data Transformation using Azure Data Factory

Azure Data Factory is an orchestration tool that is used to transform the data from one source to another. It can process and transform the raw data into predictions and insights. It allows you to perform data transformation activities via pipelines. Data flows are created in debug mode to validate the logic of the transformation. The data flow activity is added to the pipeline to execute and test the data flow. And “trigger now” is employed to test the data flow that is in the pipeline.

Several options are available for converting JSON data to flat data, however when it comes to converting flat data into JSON, it is still a challenge. With JSON, inserting null value within the data flow is quite tedious and a unique pipeline must be created to handle null values.

The Azure Data factory has three main options:

  • Author
  • Monitor
  • Manage

The Author option provides the main environment for development. Using this option, we can design and manage Azure Data Factory resources such as the pipelines and dataflows.

The Monitor option allows you to monitor the pipelines; trigger runs; sessions; the time taken for execution and running the pipelines; check if the execution of the pipeline was a success or failure and set up alerts.

The Manage option allows you to manage the connections, link service, source control, triggers, parameters, and security.

Now let’s look at how to migrate from PostgreSQL to CosmosDB and transform the data.

Pipeline and activities creation

The Author option provides the environment for the development of pipelines and data flows. So, the first step is to create a pipeline that contains the data flow activity. The pipeline is a logical grouping of activities that perform a task. By clicking on orchestrate on the home page, we create and name the pipelines.

The activities pane allows you to perform various activities such as move, transform, add data flow (we can use existing data flows or even create new data flows), Azure data explorer, Azure function, Batch Service, Databricks, Data Lake Analytics, etc. Once the data flow is created, we can provide the transformation logic in the data flow canvas. The dataflows distribute the processing of data over different nodes in a Spark cluster to perform the operations parallelly. We need to create a mapping data flow to perform the transformation as well.

We can choose the format of the data as per the requirement. We can also select the dataset, which is simply the view of data or the references of the data that you want to employ in your activity.

In our data migration scenario, say for an activity to copy data from the source (PostgreSQL data directory), the data is taken from the source and put in the blob storage. The basic details are stored in the first copy activity, the patient color details are stored in the second, microchip details are stored in the third, breed details in the fourth and existing patient details are stored in the fifth copy activity respectively. The data flow performs transformations to the data such as join, conditional join, aggregate, pivot, flatten, union, split, etc. And finally, triggers are scheduled for doing transformation in the pipelines.

Linked Service and Integration Runtime

When creating a dataset, first we must create a linked service to link the data store to Data Factory via the management hub. Linked services are much like connection strings and defines the connection to the data source.

In our data migration scenario, we have created the following linked services for Azure Managed Instance, Azure Blob storage, CosmosDB (one for dev and one for UI), and PostgreSQL. To create the linked service, we must provide the server’s name, port, database name, username, password, etc. These are the input for dataflow to connect to the specific database. In this migration, the PostgreSQL data is migrated from a different VM environment into the Azure environment.

Here, the base directory is called containers, inside the containers we have several directories which further contain storage files.

The integration runtime via the management hub is the compute infrastructure for providing data integration capabilities such as data flow, data movement, activity dispatch, and SSIS package execution across different network environments. It provides the linkage between the activity and linked services. The default option is public for the Auto-resolve integration runtime. We can also create private customized ones based on the requirement for dataflow execution.

We hope this walkthrough of migrating data from PostgreSQL to Azure CosmosDB using Azure Data Factory was helpful. If you have queries or want us to help you with your data migration projects, feel free to reach out to us.

Angular is a component-based application design framework for building scalable, reactive, and efficient single-page apps. It provides a wide range of tools and well-integrated libraries to build, develop and test your applications. The angular applications are written by composing HTML templates and components are created to manage these HTML templates. The logic of the applications is given in the services, these services and components are wrapped into the modules.

Angular Evolution

Angular framework was first introduced a decade ago in 2010 under the name Angular JS. Through the course of years, the framework has developed with various updates. AngularJS 1.x is seen as a JavaScript-based framework that creates rich web and mobile applications. The Model-View-Controller (MVC) and its variations Model-View-Presenter (MVP) and Model-View-View-Model (MVVM) have been considered as an Angular JS architecture. Even though Angular JS is a proven architectural pattern and is the standard solution for an application with a view, there are certain disadvantages.

  • Memory leakage.
  • Internet Explorer 8.0 doesn’t support Angular JS.
  • AngularJS totally depends on JavaScript.

Angular 2+ was typescript-based free open-source technology used to develop web applications and mobile applications. It is a component-based framework and has a collection of well-integrated libraries. Angular 2+ is very simple and pretty straightforward to use. Angular 2+ is typescript, which allows you to validate the code with ease and show the error at the time of typing. Implementing forms and validation are much simpler and more effective with angular 2+. With recent advancements, Agular’s responsive design has shifted towards a mobile-first approach.

Angular 3 was skipped because the framework was developed in a MonoRepo, which is a single repo, and its router package was already in the third version. To avoid confusion in terms of dependency, Angular third version was skipped.  Angular 4 was released in 2017 which was compatible with both typescript 2.1 and 2.2.  This version improved the speed and performance by a good extent. Angular 5 was released around the end of 2017 and it was comprised of many new features and improvements. It provided an optimizer that removed unnecessary code from the applications.           It also provided an improved compiler and improvements in angular universal for code allocation, and most important of all it supported typescript 2.4. Angular 6 was released in early 2018 which introduced Angular elements and Angular Rendering Engine.  Angular 7 was released later that year with major performance improvements. It also provided a drag and drop module, angular material, and component Dev kit. Angular 8 was released in 2019 with its dependencies updated, improved web worker bundling, and provided a new lazy loading syntax and also includes angular firebase.

Angular 9 was introduced in 2020, it provided new features and updates such as more consistent ng, updated and improved API extractor, dependency injection update,  better speed and performance, AOT build ensures a faster and better performing compiler, supports typescript 3.7.  Angular 10 was released in June 2020 with new updates and features such as a new data range picker, an updated compiler, Optional stricter settings to catch bugs ahead, performance improvements, supports typescript 3.9.

Angular 11 was released in November 2020, major improvements are router performance, automatic inlining of fonts, updated Hot Module Replacement(HMR), provides faster builds and improved performance, supports Typescript 4 and Webpack 5, component test harness with parallel functions and improved performance. Angular 12 was released on May 2021, with major improvements in styling, supports Typescript 4.2 and Webpack 5.3.7,  nullish coalescing for writing better and cleaner code in typescript classes, improved ng, includes protractor, new dev tools, Migrating from legacy i18n message IDs. Angular 13 is also available with impactful changes towards optimizations. Angular 13 no longer has the view engine, with reduced dependency on ngcc, we can hope for faster and improved compilation.  Angular 13 supports typescript 4.4 and provides an improved and modernized Angular Package Format,  improved Angular CLI and it no longer supports IE11.

Why Angular?

With the evolution of Angular over the years, it has become one of the most recommended for businesses and enterprises for various reasons. Angular is used to develop single-page client applications using HTML and TypeScript. Angular offers two-way data binding and it shares the data between Model as well as View. Hence, when data is modified or changed, components get updated automatically in real-time. Code reusability in an angular application is high.

Angular makes a great recommendation to businesses as it provides a framework that can work well with back-end languages and also very well combines the business logic and UI. Angular provides an effective cross-platform development framework that makes the development process easier with reduced cost. Though initially, it is a little complex to learn, it is worth it as it provides high-quality applications.   Angular provides Typescript that facilitates the developers to write clean and neat code, which makes the fixing of the bugs that much easier. The framework is structured around component-based development, which aids in a steady development process with consistent and high reusability of components. This further facilitates in providing better maintainability and productivity.

The evolution of angular has provided various new features and improvements that have significantly increased the speed and optimized the performance. The older versions had larger bundles size which hindered the fast loading of the applications, but with recent improvements such as lazy load modules and Ivy Renderer, we can create lightweight web applications that are faster and better. To overcome productivity issues and to provide a faster development process, features such as dependency injection and angular services are also provided. Angular keeps ever-evolving based on requests from google and the angular community making it one of the ideal frameworks for businesses and enterprises.

 Angular Architecture and it’s core components

The angular architecture contains the following core components

  • Module
  • Meta Data
  • Directives
  • Pipe
  • Service
  • Decorators

Module

Module in Angular refers to a place where you group the components, directives, pipes, and services, which are related to the application. Every Angular app has a root module that has the bootstrap mechanism to launch the app.

Meta Data

Metadata is used to decorate a class so that it can configure the expected behavior of the class. Metadata is attached to TypeScript using the decorator.

Directives

Directives are classes that are used to change the behavior or view of DOM elements in Angular applications. 

Three types of Angular directives are as follows:

  1. Components – directives with a template.
  2. Attribute directives – directives that change the appearance or behaviour of an element, component, or another directive.
  3. Structural directives – directives that change the DOM layout by adding and removing DOM elements.

Pipe

The pipe is asimple way to transform values in an Angular template. Some of the built-in pipes in angular are CurrencyPipe, DatePipe, JsonPipe, LowerCasePipe, UpperCasePipe, PercentPipe, SlicePipe.

Service

 Services are used to access and share the methods and properties with the other components in the entire project. It reduces the function and properties repetitions in the entire project. HTTP requests and responses will be handled in the services.

There are two types of services in angular.

  • Built-in services – There are approximately 30 built-in services in angular.
  • Custom services – In angular, if the user wants to create their own service, they can opt for custom services.

Decorators

Angular decorators are used to storing metadata. It is about a class, method, or property. There are four types of decorators in Angular:

  1. Class Decorators
  2. Property Decorators
  3. Method Decorators
  4. Parameter Decorators

1. Class Decorators

Class Decorators are top-level decorators. It defines the purpose of the class.

2. Property Decorators

Property decorators are used to specific properties within the class. @Input() is the example for the property decorator.

3. Method Decorators

A Method Decorator is used to specify the methods within your class with functionality. @HostListener() is an example of a method decorator.

4. Parameter Decorators

Parameter decorators are used to decorating parameters. It is used in the class constructors. @Inject() is the example for the parameter decorator

Testing 

Angular uses the Jasmine testing framework. It provides multiple functionalities to test. Karma is the task-runner, uses a configuration file to set the start-up, reporters, and testing framework.

Limitations of Angular

Angular though it is one of the popular modern-day frameworks that is employed, it also has a few limitations:

  • Steep learning curve

Though Angular is a great framework, it’s sometimes is quite difficult even for people with experience in HTML, JS, CSS, and for people who are not much used to n-tier architecture. They can find it hard to learn some of the concepts as it has its own set of rules, which might be difficult and uncomfortable for novice learners.

  • Limited SEO options

Though Angular is a great and powerful platform to build single-page applications, limited SEO options and poor accessibility to search engine crawlers are some of the major drawbacks. This makes it difficult to place the website correctly in the list provided by the search engines.

  • Complex and Verbose

Angular has such a complex framework of modules and complex capabilities for integration and customization. Though angular provides an array of online tutorials and documentation, it’s quite uncomfortable to learn in the beginning. A slow and steady pace is recommended to learn the platform and language.

  • Complex directives

Angular has three directives -attribute directive, structural directive, and component directive. The three directives have their own limitations and it’s quite complex for beginners to understand when to use what.

Angular Prerequisites

Angular requires the following prerequisites – NodeJS, Angular CLI, and Text Editor

Supported by Google

Google is offering a Long-Term Support (LTS) for Angular to scale up to enterprise Angular applications development. Netflix, Gmail, YouTube TV, Upwork, and other organizations are also using the Angular framework for application development.

Open-Replay

Open Replay is an open-source tool, developers can integrate this with the angular applications. When the user uses the application, open-replay will track it. So the developers can easily come to know how much the application is user-friendly. It has the network tracker, redux, NgRx action tracker, and profile tracker plugins. It also captures the performance of the application.

With this article we wanted to give you an introduction to Angular, the evolution of angular over the decade, the need for angular in today’s business world, and its core architecture. We covered the core components of the architecture and also the limitations of the framework. Bottom line is that Angular has become one of the best frameworks for developing single page apps.

Azure App Service is a Platform as a Service (PaaS) that is used to build, deploy, and scale enterprise-grade applications such as web apps, mobile apps, logic apps, API apps, and function apps. It supports multiple programming languages and frameworks such as NET, .NET Core, Java, Ruby, Node.js, PHP, Python.

From a developers perspective Azure App Service provides a great platform to develop, deploy and scale applications.  However, when it comes to production environments Infrastructure as code (IaC) comes in handy. Terraform is an open-source IaC tool with a consistent CLI that lets you write infrastructure as code using declarative configuration files and also, manage, plan and apply changes to infrastructure versions to reach the required configuration state.

Terraform is a good choice as it reduces manual human error by codifying the application infrastructure. Terraform manages infrastructure across more than 300 public clouds and it provides a reusable, cost-effective, and consistent environment that solves dependencies and version controls. In this article, we will take you through the process of deploying a web app in Azure App Service using Terraform.

To deploy the web app in Azure App Service using Terraform, here are the steps we need to follow:

  • Create the Resource Group
  • Create App Service plan and deploy web app

Create the Resource Group:

The first step is to create a resource group using the following terraform code. Any resource that is created must be created within the resource group

terraform {    
  required_providers {    
    azurerm = {    
      source = "hashicorp/azurerm"    
    }    
  }    
} 
   
provider "azurerm" {    
  features {}    
}

resource "azurerm_resource_group" "resource_group" {
  name     = "app-service-rg"
  location = "East US"
}

Create App Service plan and deploy web app

The App Service plan defines the capacity and resources to be shared among one or more app services that are assigned to that plan. Azure WebApp must be associated with an App Service Plan as it specifies the computing resources that are required for the web app to function. The following code creates an app service plan.

resource "azurerm_app_service_plan" "app_service_plan" {
  name                = "example-appserviceplan"
  location            = azurerm_resource_group.resource_group.location
  resource_group_name = azurerm_resource_group.resource_group.name

  sku {
    tier = "Standard"
    size = "S1"
  }
}

And add code for creating app service. Finally, the terraform file looks like the below

terraform {    
  required_providers {    
    azurerm = {    
      source = "hashicorp/azurerm"    
    }    
  }    
} 
   
provider "azurerm" {    
  features {}    
}

resource "azurerm_resource_group" "resource_group" {
  name     = "app-service-rg"
  location = "East US"
}

resource "azurerm_app_service_plan" "app_service_plan" {
  name                = "myappservice-plan"
  location            = azurerm_resource_group.resource_group.location
  resource_group_name = azurerm_resource_group.resource_group.name

  sku {
    tier = "Standard"
    size = "S1"
  }
}

resource "azurerm_app_service" "app_service" {
  name                = "mywebapp-453627 "
  location            = azurerm_resource_group.resource_group.location
  resource_group_name = azurerm_resource_group.resource_group.name
  app_service_plan_id = azurerm_app_service_plan.app_service_plan.id

  #(Optional)
  site_config {
dotnet_framework_version = "v4.0"
    scm_type                 = "LocalGit"
  }
  
  #(Optional)
  app_settings = {
    "SOME_KEY" = "some-value"
  }

}

Now, we should run the following command to initiate terraform.

Command: terraform init

To create an execution plan, we should run the terraform plan command

Command: terraform plan -out appservice.tfplan

To apply the plan, run the following command

Command: terraform apply ” appservice.tfplan “

We can verify the app service created in the specified app service plan and resource group by checking in the Azure portal

Hope you found this article useful. Stay tuned for more articles coming up on Azure App Service and Terraform.

With an average cost of a data breach at $3.86 million last year, it’s wise to employ a good backup system. More than 80% of the fortune 500 companies use Microsoft Azure for running their businesses effortlessly as they are simple, ever-evolving, secure and cost effective. So, in this article let’s explore how to backup and restore Azure Managed Disks using Azure Backup Vault.

Azure Backup services allow you to back up your data and recover it from the Microsoft Azure cloud. They backup and store the data in the backup vaults. These backup vaults make certain that the backups are successful by monitoring and tracking the storage containers, they optimize the resources by automating maintenance tasks and they also provide better security and access control to store and recover data.

Data sources that are supported by Azure Backup include

  • Azure Database for PostgreSQL servers,
  • Azure Blobs, and
  • Azure Disks.

Prerequisites for performing disk backup and restore operations

Backup Vault’s managed identity needs the below roles to be assigned to it for performing disk backup and restore operations:

  • Disk Backup Reader role on the Source disk that needs to be backed up.
  • Disk Snapshot Contributor role on the Resource group where backups are created and managed by Azure Backup.
  • Disk Restore Operator role on the Resource group where the disk will be restored by the Azure Backup.

To assign Azure roles, the user must have Microsoft.Authorization/roleAssignments/write permissions, such as User Access Administrator or Owner.

Steps to backup managed disks:

Create a Backup vault

  1. Go to Backup center service in Azure portal. Backup center enables enterprises to govern, monitor, operate, and analyze backups at scale. Jobs performed in last 24 hours are displayed in the Overview tab. Operations such as Scheduled backup, On-demand backup and Restore are listed along with the status of each operation (Failed, In progress or Completed)

2. Select Vault from the Overview tab

3. In Start: Create Vault page, select Backup vault and then Continue

4. In Basics tab,

  • Under PROJECT DETAILS, select the Subscription and Resource group of the vault to be created.
  • Under INSTANCE DETAILS, type in the Backup vault name.
  • Select the region of the backup vault and backup storage redundancy
  • Select Review and create

5. Select Create. The Backup vault will be created.

Create a backup policy

  1. Select Policy from Backup center’s Overview tab

2. In Start: Create Policy page, select Datasource type as Azure Disks and Vault type is prepopulated as Backup vault. Then, select Continue.

3. In the Basics tab, type in the policy name to be created. Select Datasource type as Azure Disk and select Vault as the Backup vault that was just created. Click on Next: Schedule and Retention to go to next tab.

4. In the Schedule and retention tab,

  • Under Backup schedule, select the backup schedule frequency and specify the time when backup must happen.
  • Specify the number of days backup should be retained under Retention settings.

5. After validation, in Review and Create tab, select Create. The Backup policy is created.

Configure backup of an Azure Disk

To backup an azure disk,

  1. Assign Disk Backup Reader role on the Source disk that needs to be backed up to the Backup vault’s managed identity.
  2. Assign Disk Snapshot Contributor role on the Snapshot Resource group to the Backup vault’s managed identity.

a)     Steps to Assign Disk Backup Reader role on the Source Disk

  1. Go to the source disk that we need to configure backup for.
  2. Select access control (IAM) and Add role assignment

3. In Role tab, search for the role Disk Backup Reader and select it

4. In Members tab, select assign access to Managed identity and select members as the Backup vault.

5. Select Review+ assign. Assignment of Disk Backup Reader role to the Backup vault is done.

b)     Steps to Assign Disk Snapshot Contributor role on the Snapshot Resource group

  1. Go to the target Snapshot resource group.
  2. Select access control (IAM) and Add role assignment

3. In Role tab, search for the role Disk Snapshot Contributor and select it

4. In Members tab, select assign access to Managed identity and select members as the Backup vault.

5. Select Review+ assign. Assignment of Disk Snapshot Contributor role to the Backup vault is done.

Steps to Backup an Azure Disk

  1. In Backup center, Select Backup from the Overview tab

2. In Start: Configure Backup, select Datasource type as Azure Disks and Vault type is prepopulated as Backup vault.

3. In Basics tab, select Datasource type as Azure Disks,select the Vault created and then Next.

4. In Backup policy tab, Select the backup policy.

5. In Datasources tab:

a) Click on Add/Edit and select the disks to backup.

b) Click Select after selection of disks.

c) Select Snapshot Resource Group, the resource groupwhere snapshots of disks are stored. Once the disk backup is configured, the Snapshot Resource Group that’s assigned to a backup instance cannot be changed.

d) Select Validate. Click Next.

6. In Review and Configure tab, Select Configure Backup. The configuration of backup for the disk is done.

For the validation to be successful, We must assign Disk Backup Reader role on the Source disk that needs to be backed up to the Backup vault’s managed identity and Disk Snapshot Contributor role on the Snapshot Resource group to the Backup vault’s managed identity.

On demand backup of an Azure Disk

  1. In the Backup Vault, go to Backup Instances and select the disk to perform on demand backup

2. Select Backup Now.

3. In Backup vault, go to Backup jobs to view the status of the backup.

Restore an Azure Disk from backup

To restore an azure disk, we need to assign Disk Restore Operator role to Backup vaults managed identity on the Resource group where the disk will be restored by the Azure Backup.

Steps to Assign Disk Restore Operator role on the Target Resource group

  1. Go to the target resource group.
  2. Select access control (IAM) and Add role assignment

3. In Role tab, search for the role Disk Restore Operatorand select it

4. In Members tab, select assign access to Managed identity and select members as the Backup vault.

5. Select Review+ assign. Assignment of Disk Restore Operatorrole to the Backup vault is done.

Steps to Restore an Azure Disk

  1. Go to Backup center -> select Backup vault -> select Restore

2. In Basics tab, Select the Backup instance as the disk that needs to be restored and then Next.

3. In Select Restore Point tab, select the required or latest restore point.

4. In Restore parameters, select Target subscription, and Target resource group. Type in the Restored disk name and select Next:Review and restore.

5. After validation, click on Restore. The restore operation is started.

Note: If validation is unsuccessful, follow the steps to Assign Disk Restore Operator role on the Target Resource group.

6. Restore operation is now completed.

While adopting DevOps practices automates and optimizes processes through technology, it all starts with the culture inside the organization—and the people who play a part in it.

Check out this infographic to learn how DevOps unifies people, process, and technology to bring better products to customers faster. Then imagine how the power of GitHub and Azure can benefit your DevOps team.

Together, Microsoft GitHub and Azure DevOps provides an end-to-end experience for development teams to easily collaborate while building and releasing code to Azure, on-premises or any cloud. Contact us today to learn more.

This infographic offers an in-depth look at how Microsoft business analytics and AI is intelligent, trusted, and flexible. This service produces faster, more accurate insights and predictions. It also offers the most secure, compliant, and scalable system. Finally, it works with what you have.

Would you like to leverage Microsoft Business Analytics and AI for faster, more accurate insights and predictions? At CloudIQ Technologies, we have knowledgeable and professional team ready to help you. Contact us today to learn more.

Seattle [23 Jun 2021] – CloudIQ Technologies Inc today announced it has earned the Kubernetes on Microsoft Azure advanced specialization, a validation of a solution partner’s deep knowledge, extensive experience and proven expertise in deploying and managing production workloads in the cloud using containers and managing hosted Kubernetes environments in Microsoft Azure.

Only partners that meet stringent criteria around customer success and staff skilling, as well as pass a third-party audit of their container-based workload deployment and management practices, are able to earn the Kubernetes on Azure advanced specialization.

With over 75% of global organizations expected to run containerized applications in production by 2022, many are looking for a partner with advanced skills to migrate their existing containerized workloads to the cloud, or assist them in developing cloud-native applications using container technologies, DevOps patterns, and a microservices approach.

“With our deep expertise in cloud-native architecture design, we help clients build and run scalable applications with improved security, faster release cycles, easier management, and lower costs”, said Mr. Prem Kandalu, CEO. “As a partner who has earned the Kubernetes on Microsoft Azure advanced specialization, CloudIQ will pass on the benefits of our continued collaboration with Microsoft to our clients.”

Rodney Clark, Corporate Vice President, Global Partner Solutions, Channel Sales and Channel Chief at Microsoft added, “The Kubernetes on Microsoft Azure advanced specialization highlights the partners who can be viewed as most capable when it comes to deploying and managing containerized applications in Azure. CloudIQ Technologies clearly demonstrated that they have both the skills and the experience to deliver best-in-class cloud-native capabilities to customers with Azure.

About CloudIQ Technologies

CloudIQ is a leading cloud consulting and solutions firm that helps businesses envision new products, create innovative business models, and deliver the next level of customer experiences by leveraging the cloud. From standalone cloud projects to enterprise-wide cloud architecture design you can rely on our cloud engineering expertise.

As a Microsoft Gold Partner (Cloud Platform), earner of the Modernization of Web Applications on Microsoft Azure and Kubernetes on Microsoft Azure advanced specializations, Kubernetes Certified Service Provider (KCSP) and Kubernetes Training Partner (KTP), we serve as trusted advisors to Fortune 500 organizations and leverage our deep industry expertise in building cloud-native solutions to help our clients realize the cost, scale and security benefits of the cloud.

Without artificial intelligence (AI), organizing and extracting insights from vast amounts of enterprise data would be a nearly impossible task. Choosing the right AI capabilities is essential to successful initiatives. This infographic presents the four guiding principles behind Microsoft #Azure #AI and why it remains the top choice for today’s leading enterprises.

Would you like to leverage Azure AI for your business,? At CloudIQ Technologies, we have knowledgeable and professional team ready to help you transform massive amounts of raw information into meaningful insights for your business. Contact us today to learn more.

Have you been looking for a fully-managed, secure platform for your web apps? Azure App Service is built to help you build, deploy, and scale your web apps and APIs on your terms. Work with .NET, .NET Core, Node.js, Java, Python or php, in containers or running on Windows or Linux. Check out this infographic and contact CloudIQ Technologies to learn more.

Would you like to modernize your apps using Azure App Service? At CloudIQ Technologies, we have knowledgeable and professional team ready to help you. Contact us today to learn more.

Azure SQL Databases are intelligent and always up to date. It is the only cloud with evergreen SQL which never needs to be patched or updated. This infographic presents the benefits of Azure SQL Database and Azure Advance Threat Protection.

Would you like to migrate SQL Server databases to the Azure cloud,? At CloudIQ Technologies, we have knowledgeable and professional team ready to help address any of your IT infrastructure upgrade needs. Contact us today to learn more.

Moving Windows Server and SQL workloads to Azure provides flexible, scalable, and highly available cloud infrastructure. It also supports rapid innovation and digital transformation, freeing you to focus on your mission. This infographic presents the benefits of running Windows Server and SQL Server on Azure. 

Would you like to move your Windows Server and SQL workloads to Azure? At CloudIQ Technologies, we have knowledgeable and professional team ready to help address any of your IT infrastructure upgrade needs. Contact us today to learn more.

There are four different ways of accessing Azure Data Lake Storage Gen2 in Databricks. However, using the ADLS Gen2 storage account access key directly is the most straightforward option. Before we dive into the actual steps, here is a quick overview of the entire process

  • Understand the features of Azure Data Lake Storage (ADLS)
  • Create ADLS Gen 2 using Azure Portal
  • Use Microsoft Azure Storage Explorer
  • Create Databricks Workspace
  • Integrate ADLS with Databricks
  • Load Data into a Spark DataFrame from the Data Lake
  • Create a Table on Top of the Data in the Data Lake

Microsoft Azure Data Lake Storage (ADLS) is a fully managed, elastic, scalable, and secure file system that supports HDFS semantics and works with the Apache Hadoop ecosystem.  It is built for running large-scale analytics systems that require large computing capacity to process and analyze large amounts of data

Features:

Limitless storage

ADLS is suitable for storing all types of data coming from different sources like devices, applications, and much more. It also allows users to store relational and non-relational data. Additionally, it doesn’t require a schema to be defined before data is loaded into the store. ADLS can store virtually any size of data, and any number of files. Each ADLS file is sliced into blocks and these blocks are distributed across multiple data nodes. There is no limitation on the number of blocks and data nodes.

Auditing

ADLS creates audit logs for all operations performed in it.

Access Control

ADLS provides access control through the support of access control lists (ACL) on files and folders stored in its infrastructure. It also manages authentication through the integration of AAD based on OAuth tokens from supported identity providers.

Create ADLS Gen2 using Portal:

  1. Login into the portal.
  2. Search for “Storage Account”
  3. Click “Add”

4. Choose Subscription and Resource Group.

5. Give storage account name, location, kind, and replication.

6. In the Advanced Tab, set Hierarchical namespace to Enabled

7. Click “Review+Create”

Microsoft Azure Storage Explorer

Microsoft Azure Storage Explorer is a standalone app that makes it easy to work with Azure Storage data on Windows, macOS, and Linux.  Microsoft has also provided this functionality within the Azure portal which is currently in preview mode.1.

  1. Navigate back to your data lake resource in Azure and click ‘Storage Explorer (preview)’.

2. Right-click on ‘CONTAINERS’ and click ‘Create file system’. This will be the root path for our data lake.

3. Name the file system and click ‘OK’.

4. Now, click on the file system you just created and click ‘New Folder’. This is how we will create our base data lake zones. Create folders.

5. To upload data to the data lake, you will need to install Azure Data Lake explorer using the following link.

6. Once you install the program, click ‘Add an account’ in the top left-hand corner, log in with your Azure credentials, keep your subscriptions selected, and click ‘Apply’.

7. Navigate down the tree in the explorer panel on the left-hand side until you get to the file system you created, double click on it. Then navigate into the folder. There you can upload/ download files from your local system.

8. Click “Upload” > “Upload Files”. You can get sample data set from here.

Sample Folder structure:

Create Databricks Workspace

  1. On the Azure home screen, click ‘Create a Resource’

2. In the ‘Search the Marketplace’ search bar, type ‘Databricks’ and you should see ‘Azure Databricks’ pop up as an option. Click that option.

3. Click ‘Create’ to begin creating your workspace.

4. Use the same resource group you created or selected earlier. Then, enter a workspace name.

5. Select ‘Review and Create’.

6. Once the deployment is complete, click ‘Go to resource’ and then click ‘Launch Workspace’ to get into the Databricks workspace.

Integrate ADLS with Databricks:

There are four ways of accessing Azure Data Lake Storage Gen2 in Databricks:

  1. Mount an Azure Data Lake Storage Gen2 filesystem to DBFS using a service principal and OAuth 2.0.
  2. Use a service principal directly.
  3. Use the Azure Data Lake Storage Gen2 storage account access key directly.
  4. Pass your Azure Active Directory credentials, also known as a credential passthrough.

Let’s use option 3.

1. This option is the most straightforward and requires you to run the command, setting the data lake context at the start of every notebook session. Databricks Secrets are used when setting all these configurations

2. To set the data lake context, create a new Python notebook, and paste the following code into the first cell:

spark.conf.set(
"fs.azure.account.key.<storage-account-name>.dfs.core.windows.net",
""
)

3. Replace ‘<storage-account-name>’ with your storage account name.

4. In between the double quotes on the third line, we will be pasting in an access key for the storage account that we grab from Azure

5. Navigate to your storage account in the Azure Portal and click on ‘Access keys’ under ‘Settings’.

6. Click the copy button, and paste the key1 Key in between the double quotes in your cell

7. Attach your notebook to the running cluster and execute the cell. If it worked, you should see the following:

8. If your cluster is shut down, or if you detach the notebook from a cluster, you will have to re-run this cell to access the data.

9. Copy the below command in a new cell, filling in your relevant details, and you should see a list containing the file you updated.

dbutils.fs.ls("abfss://<file-system-name>@<storage-account-
name>.dfs.core.windows.net/<directory-name>")

Load Data into a Spark DataFrame from the Data Lake

Towards the end of the Error! Reference source not found. section, we uploaded a sample CSV file into ADLS.  We will now see how we can read this CSV file from Spark.

We can get the file location from the dbutils.fs.ls command we ran earlier – see the full path as the output.

Run the command given below:

#set the data lake file location:
file_location = "abfss://[email protected]/raw/covid
19/johns-hopkins-covid-19-daily-dashboard-cases-by-states.csv"
 
#read in the data to dataframe df
df = spark.read.format("csv").option("inferSchema", "true").option("header",
"true").option("delimiter",",").load(file_location)
 
#display the dataframe
display(df)

Create a table on top of the data in the data lake

In the previous section, we loaded the data from a CSV file to a DataFrame so that it can be accessed using python spark API.  Now, we will create a Hive table in spark with data in an external location (ADLS), so that the data can be access using SQL instead of python code.

In a new cell, copy the following command:

%sql
CREATE DATABASE covid_researc

Next, create the table pointing to the proper location in the data lake.

%sql
CREATE TABLE IF NOT EXISTS covid_research.covid_data
USING CSV
LOCATION 'abfss://[email protected]/raw/covid
19/johns-hopkins-covid-19-daily-dashboard-cases-by-states.csv'

 You should see the table appear in the data tab on the left-hand navigation pane.

Run a select statement against the table.

%sql
CREATE TABLE IF NOT EXISTS covid_research.covid_data
USING CSV
LOCATION 'abfss://[email protected]/raw/covid1
9/johns-hopkins-covid-19-daily-dashboard-cases-by-states.csv'
OPTIONS (header "true", inferSchema "true")

That concludes our step-by-step guide on accessing Azure Data Lake Storage Gen2 in Databricks, using the ADLS Gen2 storage account access key directly.

Hope you found this guide useful, stay tuned for more.

The scalability, flexibility, cost-efficiency, and improved performance that comes with moving to the cloud is becoming too attractive for companies to ignore them. Many organizations have started moving to the cloud as a cost-effective option to manage their IT portfolio and avoid expensive Capex for the purchase of new servers and remove the complexities in managing on-premises architecture.

Irrespective of a company’s size, migrating to the cloud is certainly quite an undertaking. Fortunately, Microsoft has created a unique platform with a range of tools to help make the migration fast and smooth, while minimizing the risk and impact to your business.

Microsoft Azure is one of the leading cloud computing service providers that allows businesses to use cloud resources on a pay per use model, therefore you can pay for only what you need and how long you need it. Azure provides multiple options right from Infrastructure as a Service (IaaS) to Platform as a Service (PaaS) to Software as a Service (SaaS), so you can choose from a simple lift and shift approach to a more complex application modernization approach.

The migration journey begins with an Assessment of your current setup. There are tools like Azure Migrate that Azure provides for this purpose and additionally you can leverage other assessment tools from Azure migration partner ecosystem. Azure Migrate provides a centralized hub to assess and migrate to Azure on-premise servers, infrastructures, applications, and data. With Azure, you can also assess how your workloads will perform, plan, and implement your migration strategy accordingly.

5 Azure Migration Strategies

Here are five strategies that are adopted widely for migrating an application to Azure cloud.

1. Rehosting

Commonly known as “lift and shift”, this is an approach to migrate applications from an on-premise environment to the cloud with no changes to the underlying applications. This is the most popular migration approach as it allows quick migration with little risk of disruption by employing real-time replication during the transition process.

2. Refactoring

This is also known as “repacking”. It involves making small changes to the code and configuration of the application to ensure they are more compatible with the cloud so you can connect them easily to Azure-native infrastructure. This can improve the scalability and maximize the operational cost-efficiency of the platform.

3. Rearchitecting

Also known as “redesigning”, this strategy involves modifying or extending the code base of an application to optimize it to run on Azure. Rearchitecting is a time-consuming migration approach, but still, it offers infinite scalability.

4. Rebuilding

This strategy involves discarding the old application and rebuilding an application or workload from the ground up using the Azure Platform as a Service (PaaS). In this migration strategy, you manage the applications and services you develop, while Azure manages the platform and infrastructure required to run it.

5. Replacing

Under this approach, all the underlying infrastructure, middleware, application software, and application data are in the cloud and managed by Azure in Microsoft datacenters. This is used for greater efficiency and scalability.

3-Step Migration Process

Once you decide on your migration approach, the actual migration to the cloud is a 3-step process (Assess – Migrate – Optimize). Now before you get started with the migration there are a few preliminary considerations to ensure your cloud environment is ready to receive your workloads. You need to ensure that your virtual data center in the cloud contains the elements that are comparable to your on-premises environment. Building the virtual data center in the cloud is a streamlined process and it includes the following

1. Identity

To ensure authenticated access to users between your on-premises environment and workloads that you have migrated to the cloud, you need to invest in a built-in identity management solution. For this purpose, you can use the Azure Active Directory (Azure AD) or other similar solutions.

2. Storage

Migrating to the cloud requires a storage platform that meets the performance needs of your migrated workloads. You can choose from different storage types and configure exact storage requirements based on workloads to ensure security and reliability. You just need to enter a few details to get the right storage for your migration project.

3. Networking

Networking is the backbone of the data center. When migrating to the cloud, you need to keep the applications in the same subnets and IP address ranges to ensure a seamless migration.  You can create a virtual network to maintain the same performance and stability you had in the on-premise data center.

4. Connectivity

During migration, you’ll transfer a large amount of data to the cloud. So it would be wise to opt for a dedicated connectivity option to help with smooth data transfer and have the best user experience. For this purpose, you can use Azure ExpressRoute as it helps in a faster, private connection to Azure and ensures performance and security.

Now it’s time to begin your migration journey to the cloud.

Migration Phase 1: Assessment

Now that you have a better understanding of Azure and how it fits into your migration strategy, it’s time to assess your existing infrastructure. Here are four steps to do that

1. Identification of application and server dependencies

Begin with inventory and assessment of on-premises IT resources to identify opportunities to optimize the IT environment and prioritize which applications and workloads are ideal for migration. Determining your priorities and objectives early can help you have a seamless migration process.

2. Assessment of on-premises applications and servers

Your organization may run hundreds or thousands of servers and virtual machines. You need consolidated planning and a perfect tool to shift them to the cloud. Microsoft offers Azure Migrate service to provide automation for the assessment of on-premises workloads. Ultimately, the goal of this assessment phase is to collect server and application information, including configuration and usage.

3. Configuration analysis

Configuration analysis will help you understand which of your workloads can be migrated with no modifications, which ones require a few modifications, and which workloads are incompatible with the current installation. Essentially this step helps you ensure the proper functioning of the workloads on the cloud.

4. Cost planning

The final step of the assessment phase is to collect resource usage such as CPU, memory, and storage to forecast costs and expenditures. This helps in ascertaining the actual usage of your workload and ensure that your choice meets both performance and economic targets.

Migration Phase 2: Execute Migration

After you’ve completed discovery and assessment, now it’s time to prepare for the next step – migration.  The lift-and-shift method most often employed for server or VM migration is real-time replication, due to its flexibility and capability in staged migration.

1. Real-time Replication

This involves creating a copy of the workload in the cloud and allowing asynchronous replication to keep the copy and the workload in sync. Replication also lets groups of virtual machines be connected to the cloud. Real-time replication also allows the old workload to remain online and accessible during the migration to ensure zero disruptions.

2. Testing

Once the replication is complete, start your application or workloads using an isolated environment that mimics the cloud production environment. It lets you test the application without impacting the on-premise as well as cloud production systems. When you’re fully satisfied, it’s time to perform the final migration.

Migration Phase 3: Optimize

Once the migration phase is complete, you need to ensure a seamless transition of operating workloads in the cloud. This is what the optimize phase is all about.

1. Secure cloud resources

Know the security controls and the capabilities of the new cloud-based application, to ensure that the security measures are working, and responding properly. You should become familiar with the capabilities of the Azure Security Center like centralized policy management, continuous security assessment, actionable recommendations, and more.

2. Protecting Data

Ensure that the workloads and data are having a proper backup, disaster recovery, encryption, and other measures to protect your business from risks. Azure offers multiple mechanisms like Azure disk encryption, Azure Backup, and Azure Site Recovery to protect your data.

3. Monitoring Cloud Health

Azure offers many monitoring services to ensure you have full visibility into your current system status and get unique insights into your applications and infrastructure. The basic monitoring services include Azure Monitor, Service Health, and Azure Advisor. A few of the premium monitoring services include Application Insights, Azure Log Analytics, and Network Watcher.

There are many options and reasons for migrating workloads to Azure.

With this straightforward guide, migration to Azure wouldn’t be a complex task anymore. By having a proper plan and mapping out the key objectives, you can ensure a successful Azure migration.

Want to know the key benefits of using Windows Server and SQL Server with Microsoft Azure? Here are four of them right from reducing costs and streamlining IT resources; modernizing by migrating to a flexible, open cloud; innovating new apps or managing existing server apps with unlimited flexibility; and ensuring data protection, security, and business continuity.

Would you like to modernize digital processes to improve profitability and ensure data security? At CloudIQ Technologies, we have knowledgeable and professional team ready to help you. Contact us today to learn more.

Azure Databricks provides comprehensive end-to-end diagnostic logs of activities performed by Azure Databricks users, allowing your enterprise to monitor detailed Azure Databricks usage patterns.

In this article, we’re going to look at sending the logs of Azure Databricks workspace to log analytics workspace using diagnostics settings present in the Databricks workspace.

Here are the pre-requisites and steps to enable diagnostics setting for Azure Databricks

Pre-requisites:
  1. User with owner or contributor access where the Databricks workspace is deployed.
  2. Databricks workspace. The diagnostics logging for Azure Databricks service is available only for the Premium plan.
Steps:
  1. Login to Azure portal.
  2. Select the Databricks workspace.

3. Select the diagnostics settings.

4. Now click “+ Add Diagnostics Settings”.

5. Azure Databricks provides diagnostic logs for the following services:

  • DBFS
  • Clusters
  • Pools
  • Accounts
  • Jobs
  • Notebook
  • SSH
  • Workspace
  • Secrets
  • SQL Permissions

6. Here we are going to send the logs to the log analytics workspace.

7. Select all the logs you want and send them to log analytics. Here we’re sending cluster logs.

8. Click Save.

9. Allow some time to ingest the logs to log analytics workspace.

10. Now go to the log analytics workspace where the diagnostics are configured.

11. Select logs. Now using KQL we can query our data sent from the Databricks workspace.

12. The Databricks log tables are found under the LogManagement category.

Databricks Monitoring Dashboard

Here is the simple Databricks Monitoring dashboard we created for

  • Cluster availability
  • Failed job trend
  • Success vs failed job trend

We hope this article helps you set up the right configurations to sending the logs of Azure Databricks workspace to log analytics workspace and build the Databricks monitoring dashboard.

Did you know migrating to Microsoft Azure can reduce your data center footprint 73%? 

Take advantage of your current investments and IT skills in Microsoft technologies. Microsoft applications and Azure have been built to work better together with flexibility, high compatibility and hybrid capabilities. 

Check out this infographic to learn, how you can get unparalleled cost savings, easily plan migrations, avoid complexity of multi-vendor support, and modernize your applications in the cloud from the leader you already trust.

Migrate to Azure at your own pace with confidence and support from CloudIQ. At CloudIQ Technologies, we have knowledgeable and professional team ready to help you modernize workloads with Azure. Contact us today to learn more.

test beta

Break down the cloud journey with four stages of the process—starting with a pre-migration assessment and then looking at migration, post-migration, and optimization. Microsoft Azure has you covered with tools created specifically for you.

Would you like to modernize your apps and data on Azure? At CloudIQ Technologies, we have knowledgeable and professional team ready to help you. Contact us today to learn more.

It’s time to create and implement business and technology strategies powered by the cloud. Here is your complete game plan. Check the “Cloud Adoption Framework” infographic to plan your strategy to modernize and innovate. 

Would you like to migrate to the cloud? At CloudIQ Technologies, we have knowledgeable and professional team ready to help you successfully adopt Cloud. Contact us today to learn more.

In our previous blog on getting started with Azure Databricks, we looked at Databricks tables.  In this blog, we will look at a type of Databricks table called Delta table and best practices around storing data in Delta tables.

1. Delta Lake

Delta Lake is an open-source storage layer that brings reliability to data lakes. Delta Lake provides ACID transactions, scalable metadata handling, and unifies streaming and batch data processing. Delta Lake runs on top of your existing data lake and is fully compatible with Apache Spark APIs. Databricks Delta table is a table that has a Delta Lake as the data source similar to how we had a CSV file as a data source for the table in the previous blog.

2. Table which is not partitioned

When we create a delta table and insert records into it, Databricks loads the data into multiple small files.  You can see the multiple files created for the table “business.inventory” below

3. Partitioned table

Partitioning involves putting different rows into different tables.  E.g., if we have an address table with addresses in the US, the addresses might be stored in 50 different tables corresponding to the 50 states in the US.  A view with a union might be created over all of them to provide a complete view of all addresses.

Sample code to create a table partitioned by date column is given below:

CREATE TABLE events (
  date DATE,
  eventId STRING,
  eventType STRING,
  data STRING)
USING delta
PARTITIONED BY (date) 

The table “business.sales” given below is partitioned by InvoiceDate.  You can see that there is a folder created for each InvoiceDate and within the folders, there are multiple files that store the data for this table.

This partitioning will be useful when we have queries selecting records from this table with InvoiveDate in WHERE clause. 

E.g.:
SELECT SLSDA_ID, RcdType, DistId
FROM business.sales
WHERE InvoiceDate = ‘2013-01-01’

In total there are 40,545 files for this table which you can see from below screenshot

4. OPTIMIZE

SMALL FILE PROBLEM

Historical and new data is often written in very small files and directories.  This data may be spread across a data center or even across the world (that is, not co-located).  The result is that a query on this data may be very slow due to

  • network latency
  • volume of file metadata

The solution is to compact many small files into one larger file.

OPTIMIZE command invokes the bin-packing (Compaction) algorithm to coalesce small files into larger ones.  Small files are compacted together into new larger files up to 1GB.

You can see below that the OPTIMIZE command has removed the 40,545 files and instead of them added 2378 files.  Also, observe that after Optimization the size of the table has decreased from 1.49 GB to 1.08 GB

5. Optimize table which is not partitioned

Optimize will compact the small files for tables that are not partitioned too.

business.finance_transactions_silver table is not partitioned and is currently having 64 files with total size 858 MB

Running the Optimize command coalesces the 64 files to 1 file

Note that “partitionsOptimized” is 1 in this case.  Previously for the partitioned table “partitionsOptimized was 2509.  OPTIMIZE command coalesces the small files within a partition only.  If the table is not partitioned, the whole table is considered as a single xpartition.

6. ZORDER
  • Data Skipping is a performance optimization that aims at speeding up queries that contain filters (WHERE clauses).
    As new data is inserted into a Databricks Delta table, file-level min/max statistics are collected for all columns (including nested ones) of supported types. Then, when there’s a lookup query against the table, Databricks Delta first consults these statistics to determine which files can safely be skipped.  This is done automatically and no specific commands are required to be run for this.
  • Z-Ordering is a technique to co-locate related information in the same set of files.
    Z-Ordering maps multidimensional data to one dimension while preserving the locality of the data points.

Given a column that you want to perform ZORDER on, say OrderColumn, Delta

  • Takes existing parquet files within a partition.
  • Maps the rows within the parquet files according to OrderColumn using the Z-order curve algorithm.
  • In the case of only one column, the mapping above becomes a linear sort
  • Rewrites the sorted data into new parquet files.

Note: We cannot use the table partition column also as a ZORDER column.

Syntax for ZORDER is

OPTIMIZE tablename
ZORDER BY (OrderColumn) 
7. Best practices

a. PARTITION BY

  • Partition the table by a column which is used in the WHERE clause or ON clause (join).  The most commonly used partition column is the date.
  • Use columns with low cardinality.  If the cardinality of a column will be very high, do not use that column for partitioning. For example, if you partition by a column userId and if there can be 1M distinct user IDs, then that is a bad partitioning strategy.
  • Amount of data in each partition: You can partition by a column if you expect data in that partition to be at least 1 GB.  Partitioning is not required for smaller tables.
  • PARTITION BY is done on a single column only

b. OPTIMIZE

  • OPTIMIZE is required for all tables to which we write data continuously on a daily basis.
  • OPTIMIZE is not required for tables that have static data/reference data which are rarely updated.
  • There is a cost associated with OPTIMIZE (Running Optimize command for sales took 6.64 minutes).  We should run it more often (daily) if we want better end-user query performance.  We should run it less often if we want to optimize costs.

c. ZORDER BY

  • If we expect a column to be commonly used in query predicates and if that column has high cardinality (that is, a large number of distinct values), then use ZORDER BY.
  • We can specify multiple columns for ZORDER BY as a comma-separated list. However, the effectiveness of the locality drops with each additional column.
8. References:

Migrating your IT infrastructure to the cloud has a ton of benefits. Whether you’re looking to improve security and become GDPR compliant, cut your total cost of ownership, or promote teamwork and innovation by integrating AI capabilities, the cloud provides a solution to your IT problems.

Would you like to upgrade your IT infrastructure to the cloud? At CloudIQ Technologies, we have knowledgeable and professional team ready to help address any of your IT infrastructure upgrade needs. Contact us today to learn more.

Backing up on-premises resources to the cloud leverages the power and scale of the cloud to deliver high-availability with no maintenance or monitoring overhead. With Azure Backup service the benefits keep adding up, right from data security to centralized monitoring and management.

Azure Backup service uses the MARS agent to back up files, folders, and system state from on-premises machines and Azure VMs. The backups are stored in a Recovery Services vault.

In this article, we will look at how to back up on-premise files and folders using Microsoft Azure Recovery Services (MARS) agent.

There are two ways to run the MARS agent:

  • Directly on on-premises Windows machines.
  • On Azure VMs that run Windows side by side with the Azure VM backup extension.

Here is the step by step process.

Create a Recovery Services vault

1. Sign in to the Azure portal.
2. On the Recovery Services vaults dashboard, select “Add”.

3. The Recovery Services vault dialog box opens. Provide values of the Name, Subscription, Resource group, and Location.
4. Select “Create”.

Download the MARS agent

1. Download the MARS agent so that you can install it on the machines that you want to back up.
2. In the vault, select “Backup”.
3. Select On-premises for “Where is your workload running?”
4. Select Files and folders for “What do you want to back up?”
5. Select “Prepare Infrastructure”.

5. For “Prepare infrastructure”, under Install Recovery Services agent, download the MARS agent.
6. Select “Already downloaded or using the latest Recovery Services Agent”, and then download the vault credentials.
7. Select “Save”.

Install and register the agent

1. Run the MARSagentinstaller.exe file on the VM.
2. In the “MARS Agent Setup Wizard”, select “Installation Settings”.
3. Choose where to install the agent and choose a location for the cache. Select “Next”.

  • The cache is for storing data snapshots before sending them to recovery services vault.
  • The cache location should have free space equal to at least 5 percent of the size of the data you’ll back up.

4. For Proxy Configuration, specify how the agent that runs on the Windows machine will connect to the internet. Then select “Next”.

5. For Installation, review, and select “Install”.
6. After the agent is installed, select “Proceed to Registration”.

7. In Register Server Wizard > Vault Identification, browse to and select the credentials file. Then select “Next”.

8. On the “Encryption Setting” page, specify a passphrase(user-defined) which is used to encrypt and decrypt backups for the machine.

9. Save the passphrase in a secure location. It is needed while restoring a backup.
10. Select “Finish”.

Create a backup policy

Steps to create a backup policy:

1. Open the MARS agent console.
2. Under “Actions”, select “Schedule Backup”.

3. In the “Schedule Backup Wizard”, select “Getting started” and click “Next”.
4. Under “Select Items to Back up”, select “Add Items”.

5. Select items to back up, and select OK.

6. On the “Select Items to Back Up” page, select “Next”.
7. Specify when to take daily or weekly backups in the “Specify Backup Schedule” page and select “Next”.
8. It is possible to schedule up to three daily backups per day and can run weekly backups too.

9. On the “Select Retention Policy” page, specify how to store copies of your data. And select “Next”.
10. On the Confirmation page, review the information, and then select “Finish”.

11. After the wizard finishes creating the backup schedule, select “Close”.

We hope this step by step guide helps you back up on-premise files and folders using Microsoft Azure Recovery Services (MARS) agent.

Many businesses are struggling to find the talent and capacity to create and manage their machine learning models to actually unlock the insights within their data. Here is an infographic that shows how Microsoft Azure Machine Learning streamlines this process to make modeling accessible to all businesses. 

What’s holding your business back from using AI to turn your data into actionable insights? Azure Machine Learning makes AI more accessible to businesses of all sizes and experience levels by reducing cost and helping you create and manage your models. But you don’t have to go it alone. We can help you assess your business needs and adopt the right AI solution. Contact us today to learn how we can help transform your business with AI.

The distributed nature of cloud applications requires a messaging infrastructure that connects the components and services, ideally in a loosely coupled manner in order to maximize scalability. In this article let’s explore the asynchronous messaging options in Azure.

At an architectural level, a message is a datagram created by an entity (producer), to distribute information so that other entities (consumers) can be aware and act accordingly. The producer and the consumer can communicate directly or optionally through an intermediary entity (message broker).  

Messages can be classified into two main categories. If the producer expects an action from the consumer, that message is a command. If the message informs the consumer that an action has taken place, then the message is an event.

Commands

The producer sends a command with the intent that the consumer(s) will perform an operation within the scope of a business transaction.

A command is a high-value message and must be delivered at least once. If a command is lost, the entire business transaction might fail. Also, a command shouldn’t be processed more than once. Doing so might cause an erroneous transaction. A customer might get duplicate orders or billed twice.

Commands are often used to manage the workflow of a multistep business transaction. Depending on the business logic, the producer may expect the consumer to acknowledge the message and report the results of the operation. Based on that result, the producer may choose an appropriate course of action.

Events

An event is a type of message that a producer raises to announce facts.

The producer (known as the publisher in this context) has no expectations that the events will result in any action.

Interested consumer(s), can subscribe, listen for events, and take actions depending on their consumption scenario. Events can have multiple subscribers or no subscribers at all. Two different subscribers can react to an event with different actions and not be aware of one another.

The producer and consumer are loosely coupled and managed independently. The consumer isn’t expected to acknowledge the event back to the producer. A consumer that is no longer interested in the events, can unsubscribe. The consumer is removed from the pipeline without affecting the producer or the overall functionality of the system.

There are two categories of events:

  • The producer raises events to announce discrete facts. A common use case is event notification. For example, Azure Resource Manager raises events when it creates, modifies, or deletes resources. A subscriber of those events could be a Logic App that sends alert emails.
  • The producer raises related events in a sequence, or a stream of events, over a period of time. Typically, a stream is consumed for statistical evaluation. The evaluation can be done within a temporal window or as events arrive. Telemetry is a common use case, for example, health and load monitoring of a system. Another case is event streaming from IoT devices.

A common pattern for implementing event messaging is the Publisher-Subscriber pattern.

Role and benefits of a message broker

An intermediate message broker provides the functionality of moving messages from producer to consumer and can offer additional benefits.

Decoupling

A message broker decouples the producer from the consumer in the logic that generates and uses the messages, respectively. In a complex workflow, the broker can encourage business operations to be decoupled and help coordinate the workflow.

Load balancing

Producers may post a large number of messages that are serviced by many consumers. Use a message broker to distribute processing across servers and improve throughput. Consumers can run on different servers to spread the load. Consumers can be added dynamically to scale out the system when needed or removed otherwise.

Load leveling

The volume of messages generated by the producer or a group of producers can be variable. At times there might be a large volume causing spikes in messages. Instead of adding consumers to handle this work, a message broker can act as a buffer, and consumers gradually drain messages at their own pace without stressing the system.

Reliable messaging

A message broker helps ensure that messages aren’t lost even if communication fails between the producer and consumer. The producer can post messages to the message broker and the consumer can retrieve them when communication is re-established. The producer isn’t blocked unless it loses connectivity with the message broker.

Resilient messaging

A message broker can add resiliency to the consumers in your system. If a consumer fails while processing a message, another instance of the consumer can process that message. The reprocessing is possible because the message persists in the broker.

Technology choices for a message broker

Azure provides several message broker services, each with a range of features.

Azure Service Bus

Azure Service Bus queues are well suited for transferring commands from producers to consumers. Here are some considerations.

Pull model

A consumer of a Service Bus queue constantly polls Service Bus to check if new messages are available. The client SDKs and Azure Functions trigger for Service Bus abstract that model. When a new message is available, the consumer’s callback is invoked, and the message is sent to the consumer.

Guaranteed delivery

Service Bus allows a consumer to peek the queue and lock a message from other consumers.

It’s the responsibility of the consumer to report the processing status of the message. Only when the consumer marks the message as consumed, Service Bus removes the message from the queue. If a failure, timeout, or crash occurs, Service Bus unlocks the message so that other consumers can retrieve it. This way messages aren’t lost in the transfer.

Message ordering

If you want consumers to get the messages in the order they are sent, Service Bus queues guarantee first-in-first-out (FIFO) ordered delivery by using sessions. A session can have one or more messages.

Message persistence

Service bus queues support temporal decoupling. Even when a consumer isn’t available or unable to process the message, it remains in the queue.

Checkpoint long-running transactions

Business transactions can run for a long time. Each operation in the transaction can have multiple messages. Use checkpointing to coordinate the workflow and provide resiliency in case a transaction fails.

Hybrid solution

Service Bus bridges on-premises systems and cloud solutions. On-premises systems are often difficult to reach because of firewall restrictions. Both the producer and consumer (either can be on-premises or the cloud) can use the Service Bus queue endpoint as the pickup and drop off location for messages.

Topics and subscriptions

Service Bus supports the Publisher-Subscriber pattern through Service Bus topics and subscriptions.

Azure Event Grid

Azure Event Grid is recommended for discrete events. Event Grid follows the Publisher-Subscriber pattern. When event sources trigger events, they are published to Event grid topics. Consumers of those events create Event Grid subscriptions by specifying event types and an event handler that will process the events. If there are no subscribers, the events are discarded. Each event can have multiple subscriptions.

Push Model

Event Grid propagates messages to the subscribers in a push model. Suppose you have an event grid subscription with a webhook. When a new event arrives, Event Grid posts the event to the webhook endpoint.

Custom topics

Create custom Event Grid topics, if you want to send events from your application or an Azure service that isn’t integrated with Event Grid.

High throughput

Event Grid can route 10,000,000 events per second per region. The first 100,000 operations per month are free.

Resilient delivery

Even though successful delivery for events isn’t as crucial as commands, you might still want some guarantee depending on the type of event. Event Grid offers features that you can enable and customize, such as retry policies, expiration time, and dead lettering.

Azure Event Hubs

When working with an event stream, Azure Event Hubs is the recommended message broker. Essentially, it’s a large buffer that’s capable of receiving large volumes of data with low latency. The data received can be read quickly through concurrent operations. You can transform the data received by using any real-time analytics provider. Event Hubs also provides the capability to store events in a storage account.

Fast ingestion

Event Hubs are capable of ingesting millions of events per second. The events are only appended to the stream and are ordered by time.

Pull model

Like Event Grid, Event Hubs also offers Publisher-Subscriber capabilities. A key difference between Event Grid and Event Hubs is in the way event data is made available to the subscribers. Event Grid pushes the ingested data to the subscribers whereas Event Hub makes the data available in a pull model. As events are received, Event Hubs appends them to the stream. A subscriber manages its cursor and can move forward and back in the stream, select a time offset, and replay a sequence at its pace.

Partitioning

A partition is a portion of the event stream. The events are divided by using a partition key. For example, several IoT devices send device data to an event hub. The partition key is the device identifier. As events are ingested, Event Hubs move them to separate partitions. Within each partition, all events are ordered by time.

Event Hubs Capture

The Capture feature allows you to store the event stream to Azure Blob storage or Data Lake Storage. This way of storing events is reliable because even if the storage account isn’t available, Capture keeps your data for a period, and then writes to the storage after it’s available.

We hope this quick start guide helps you get stated on azure messaging and event driven architecture.

Azure Databricks lets you spin up clusters and build quickly in a fully managed Apache Spark environment with the global scale and availability of Azure. And of course, for any production-level solution, monitoring is a critical aspect.

Azure Databricks comes with robust monitoring capabilities for custom application metrics, streaming query events, and application log messages. It allows you to push this monitoring data to different logging services.

In this article, we will look at the setup required to send application logs and metrics from Microsoft Azure Databricks to a Log Analytics workspace.

Prerequisites
  1. Clone the repository mentioned below
    https://github.com/mspnp/spark-monitoring.git
  2. Azure Databricks workspace
  3. Azure Databricks CLI
    Databricks workspace personal access token is required to use the CLI
    You can also use the Databricks CLI from Azure Cloud Shell.
  4. Java IDEs with the following resources
    Java Development Kit (JDK) version 1.8
    Scala language SDK 2.11
    Apache Maven 3.5.4
Building the Azure Databricks monitoring library with Docker

After cloning repository please open the terminal in the respective path

Please run the command as follows
Windows :

docker run -it --rm -v %cd%/spark-monitoring:/spark-monitoring -v "%USERPROFILE%/.m2":/root/.m2 maven:3.6.1-jdk-8 /spark-monitoring/build.sh 

Linux:

chmod +x spark-monitoring/build.sh
docker run -it --rm -v `pwd`/spark-monitoring:/spark-monitoring -v "$HOME/.m2":/root/.m2 maven:3.6.1-jdk-8 /spark-monitoring/build.sh 
Configuring Databricks workspace

dbfs configure –token
It will ask for Databricks workspace URL and Token
Use the personal access token that was generated when setting up the prerequisites
You can get the URL from
Azure portal > Databricks service > Overview

 dbfs mkdirs dbfs:/databricks/spark-monitoring 

Open the file /src/spark-listeners/scripts/spark-monitoring.sh
Now add the Log Analytics  Workspace ID and Key

Use Databricks CLI to copy the modified script

dbfs cp <local path to spark-monitoring.sh> dbfs:/databricks/spark-monitoring/spark-monitoring.sh 

Use Databricks CLI to copy all JAR files generated

dbfs cp --overwrite --recursive <local path to target folder> dbfs:/databricks/spark-monitoring/ 
Create and configure the Azure Databricks cluster
  1. Navigate to your Azure Databricks workspace in the Azure Portal.
  2. On the home page, click on “new cluster”.
  3. Choose a name for your cluster and enter it in the text box titled “cluster name”.
  4. In the “Databricks Runtime Version” dropdown, select 5.0 or later (includes Apache Spark 2.4.0, Scala 2.11).

5 Under “Advanced Options”, click on the “Init Scripts” tab. Go to the last line under the
“Init Scripts section” and select “DBFS” under the “destination” dropdown. Enter
“dbfs:/databricks/spark-monitoring/spark-monitoring.sh” in the text box. Click the
“Add” button.

6 Click the “create cluster” button to create the cluster. Next, click on the “start” button to start the cluster.

Now you can run the jobs in the cluster and can get the logs in the Log Analytics workspace

We hope this article helps you set up the right configurations to send application logs and metrics from Azure Databricks to your Log Analytics workspace.

This infographic outlines a day in the life of a remote worker using Microsoft Teams to collaborate, create, and be more productive while working at home. See how this individual uses multiple features to stay connected with the team and work efficiently

The daily life of most workers has changed drastically as COVID19 has made employees move to home offices. But your team can still get work done. Microsoft Teams can make it possible. Contact us to enable MS teams for your organization.

Databricks is a web-based platform for working with Apache Spark, that provides automated cluster management and IPython-style notebooks.  To understand the basics of Apache Spark, refer to our earlier blog on how Apache Spark works

Databricks is currently available on Microsoft Azure and Amazon AWS.  In this blog, we will look at some of the components in Azure Databricks.

1.   Workspace

A Databricks Workspace is an environment for accessing all Databricks assets. The Workspace organizes objects (notebooks, libraries, and experiments) into folders, and provides access to data and computational resources such as clusters and jobs.

Create a Databricks workspace

The first step to using Azure Databricks is to create and deploy a Databricks workspace. You can do this in the Azure portal.

  1. In the Azure portal, select Create a resource > Analytics > Azure Databricks.
  2. Under Azure Databricks Service, provide the values to create a Databricks workspace.

    a. Workspace Name: Provide a name for your workspace.
    b. Subscription: Choose the Azure subscription in which to deploy the workspace.
    c. Resource Group: Choose the Azure resource group to be used.
    d. Location: Select the Azure location near you for deployment.
    e. Pricing Tier: Standard or Premium

Once the Azure Databricks service is created, you will get the screen given below.  Clicking on the Launch Workspace button will open the workspace in a new tab of the browser.

2.   Cluster

A Databricks cluster is a set of computation resources and configurations on which we can run data engineering, data science, and data analytics workloads, such as production ETL pipelines, streaming analytics, ad-hoc analytics, and machine learning.

To create a new cluster:

  1. Select Clusters from the left-hand menu of Databricks’ workspace.
  2. Select Create Cluster to add a new cluster.

We can select the Scala and Spark versions by selecting the appropriate Databricks Runtime Version while creating the cluster.

3.   Notebooks

A notebook is a web-based interface to a document that contains runnable code, visualizations, and narrative text.  We can create a new notebook using either the “Create a Blank Notebook” link in the Workspace (or) by selecting a folder in the workspace and then using the Create >> Notebook menu option.

While creating the notebook, we must select a cluster to which the notebook is to be attached and also select a programming language for the notebook – Python, Scala, SQL, and R are the languages supported in Databricks notebooks.

The workspace menu also provides us the option to import a notebook, by uploading a file (or) specifying a file.  This is helpful if we want to import (Python / Scala) code developed in another IDE (or) if we must import code from an online source control system like git.

In the below notebook we have python code executed in cells Cmd 2 and Cmd 3; a python spark code executed in Cmd 4.  The first cell (Cmd 1) is a Markdown cell.  It displays text which has been formatted using markdown language.

Magic commands

Even though the above notebook was created with Language as python, each cell can have code in a different language using a magic command at the beginning of the cell.  The markdown cell above has the code below where %md is the magic command:

%md Sample Databricks Notebook 

The following provides the list of supported magic commands:

  • %python – Allows us to execute Python code in the cell.
  • %r – Allows us to execute R code in the cell.
  • %scala – Allows us to execute Scala code in the cell.
  • %sql – Allows us to execute SQL statements in the cell.
  • %sh – Allows us to execute Bash Shell commands and code in the cell.
  • %fs – Allows us to execute Databricks Filesystem commands in the cell.
  • %md – Allows us to render Markdown syntax as formatted content in the cell.
  • %run – Allows us to run another notebook from a cell in the current notebook.

4.   Libraries

To make third-party or locally built code available (like .jar files) to notebooks and jobs running on our clusters, we can install a library. Libraries can be written in Python, Java, Scala, and R. We can upload Java, Scala, and Python libraries and point to external packages in PyPI, or Maven.

To install a library on a cluster, select the cluster going through the Clusters option in the left-side menu and then go to the Libraries tab.

Clicking on the “Install New” option provides us with all the options available for installing a library.  We can install the library either uploading it as a Jar file or getting it from a file in DBFS (Data Bricks File System).  We can also instruct Databricks to pull the library from Maven or PyPI repository by providing the coordinates.

5. Jobs

During code development, notebooks are run interactively in the notebook UI.  A job is another way of running a notebook or JAR either immediately or on a scheduled basis.

We can create a job by selecting Jobs from the left-side menu and then provide the name of job, notebook to be run, schedule of the job (daily, hourly, etc.)

Once the jobs are scheduled, the jobs can be monitored using the same Jobs menu.

6.   Databases and tables

A Databricks database is a collection of tables. A Databricks table is a collection of structured data. Tables are equivalent to Apache Spark DataFrames. We can cache, filter, and perform any operations supported by DataFrames on tables. You can query tables with Spark APIs and Spark SQL.

Databricks provides us the option to create new Tables by uploading CSV files; Databricks can even infer the data type of the columns in the CSV file.

All the databases and tables created either by uploading files (or) through Spark programs can be viewed using the Data menu option in Databricks workspace and these tables can be queried using SQL notebooks.

We hope this article helps you getting started with Azure Databricks. You can now spin up clusters and build quickly in a fully managed Apache Spark environment with the global scale and availability of Azure.

To keep business critical applications running 24/7/365 it is important for organizations to have a sound business continuity and disaster recovery strategy. In this article we will discuss how to set up disaster recovery for Azure VM in secondary region.

We will use Azure Site Recovery that helps manage and orchestrate disaster recovery of on-premises machines and Azure virtual machines (VM), including replication, failover, and recovery.

Pre-requisites,
  1. Recovery services vault
  2. A Virtual machine

ENABLING REPLICATION

To replicate a VM to secondary region, prepare site recovery infrastructure. In this case we are replicating from one azure region to another.

For replication from Azure to Azure, one can directly go to recovery service vault and replicate the VM.

  1. Go to recovery services vault >> Replicated Items
  2. Click on the Replicate icon and follow.

3. Select the following

  • Source: Azure
  • Source location: Region where the VM is deployed
  • Azure virtual machine deployment model: Resource manager
  • Source subscription: Subscription where the VM is deployed
  • Source Resource group: Resource group where the VM is deployed

4. Select OK to proceed to next step.

5.Select the VM to replicate

6. Target configurations

  • Target location: Secondary region where the VM is to be replicated
  • Target subscription: Subscription where the VM to be replicated

By default, the following resources are created in target region and can be customized per your need

  • Resource group
  • Virtual network
  • Cache storage account
  • Replica managed disks
  • Target availability sets (if applicable)

You can set replication policies and view extension details here

7.Click on create target resources

8.Then select Enable replication

Go to recovery services vault >> Monitoring >> Site recovery Jobs to view the jobs that are running during the replication.

Look for the following jobs

  1. Prerequisites check for enabling protection
  2. Installing Mobility Service and preparing target
  3. Enable replication
  4. Starting initial replication
  5. Updating the provider states

Once the above-mentioned jobs are over, here is what happens

  • Synchronization process begins
  • Waiting for first recovery point
  • The VM is protected

Once all processes are completed, you can view the VM by going to recovery services vault >> Replicated Items

Introduction to Terraform

Terraform is an open-source tool for managing cloud infrastructure. Terraform uses Infrastructure as Code (IaC) for building, changing and versioning infrastructure safely. Terraform is used to create, manage, and update infrastructure resources such as virtual machines, virtual networks, and clusters.

The Terraform CLI provides a simple mechanism to deploy and version the configuration files to Azure. And with AzureRM you can create, modify and delete azure resources in Terraform configuration.

The infrastructure that Terraform can manage, includes low-level components such as compute instances, storage, and networking, as well as high-level components such as DNS entries, SaaS features, etc.

Providers in Terraform

A provider is responsible for understanding API interactions and exposing resources. Providers generally are an IaaS

  • Azure
  • Aws
  • Google Cloud
  • OpenStack
  • Docker
  • Alibaba Cloud
  • VMware   

For each provider, there are many kinds of resourcesyou can create. Here is the general Syntax for terraform resources.

resource  “<provider>_<type>”   “<name>” 	{
[config]
}

Where PROVIDER is the name of a provider (e.g., Azure), TYPE is the type of resources to create in that provider (e.g., Instance), NAME is an identifier you can use throughout the Terraform code to refer to this resource and CONFIG consists of one or more argumentsthat are specific to that resource.

Terraform Features:

Infrastructure as Code

Infrastructure is described using a high-level configuration syntax. This allows a blueprint of your datacenter to be versioned and treated as you would any other code. Additionally, infrastructure can be shared and re-used.

Execution Plans

Terraform has a “planning” step where it generates an execution plan. The execution plan shows what Terraform will do when you call apply. This lets you avoid any surprises when Terraform manipulates infrastructure.

Resource Graph

Terraform builds a graph of all your resources and parallelizes the creation and modification of any non-dependent resources. Because of this, Terraform builds infrastructure as efficiently as possible, and operators get insight into dependencies in their infrastructure.

Change Automation

Complex changesets can be applied to your infrastructure with minimal human interaction. With the previously mentioned execution plan and resource graph, you know exactly what Terraform will change and in what order, avoiding many possible human errors.

TERRAFORM STRUCTURE

The primary module structure requirement is that a “root module” must exist. The root module is the directory that holds the Terraform configuration files that are applied to build your desired infrastructure. Any module should include, at a minimum, a “main.tf”, a “variables.tf” and “outputs.tf” file.

main.tf calls modules, locals, and data-sources to create all resources. If using nested modules to split up your infrastructure’s required resources, the “main.tf” file holds all your module blocks and any needed resources not contained within your nested modules.

variables.tf contains the input variable and output variable declarations.

outputs.tf tells Terraform what data is important. This data is outputted when “apply” is called and can be queried using the Terraform “output” command. It contains outputs from the resources created in main.tf.

TFVARS File – To persist variable values, create a file and assign variables within this file. Within the current directory, for all files that match terraform.tfvars or *.auto.tfvars, terraform automatically loads them to populate variables.

MODULES

Modules are subdirectories with self-contained Terraform code. A module is a container for multiple resources that are used together. The root module is the directory that holds the Terraform configuration files that are applied to build your desired infrastructure. The root module may call other modules and connect them by passing output values from one as input values of another.

In production, we may need to manage multiple environments, and different products with similar infrastructure. Writing code to manage each of these similar configurations increases redundancy in the code.  And finally, we need the capability to test different versions while keeping the production infrastructure stable.

Terraform provides modules that allow us to abstract away re-usable parts, which can be configured once, and used everywhere. Modules allow us to group resources, define input variables which are used to change resource configuration parameters and define output variables that other resources or modules can use.

Modules can also call other modules using a “module” block, but we recommend keeping the module tree relatively flat and using module composition as an alternative to a deeply nested tree of modules, because this makes the individual modules easier to re-use in different combinations.

Terraform Workflow :

There are steps to build infrastructure with terraform

  • INIT
  • Plan
  • Apply
  • Destroy
INIT

Initialize the Terraform configuration directory using Terraform “init”.

Init will create a hidden directory “.terraform” and download plugins as needed by the configuration. Init also configures the “-backend-config” option and can be used for partial backend configuration.

Command

terraform init -backend-config=”backend-dev.config”

backend-dev.config – This file contains the details shown in the screenshot below.

PLAN

The terraform plan command is used to create an execution plan. The plan will be used to see all the resources that are getting created/updated/deleted, before getting applied. Actual creation will happen in the “apply” command.

The var file given will define resources that are unique for each team.

Command:

terraform plan -var-file="parentvarvalues.tfvars"

This file includes all global variables and Azure subscription details.

APPLY

The Terraform “apply” command is used to apply changes in the configuration. You’ll notice that the “apply” command shows you the same “plan” output and asks you to confirm if you want to proceed with this plan.

The “-auto-approve” parameter will skip the confirmation for creating resources. It’s better not to have it when you want to apply directly, without “plan”.

Command:

terraform apply -var-file="parentvarvalues-team1.tfvars" -auto-approve

Terraform State Management:

Terraform stores the resources it manages into a state file. There are two types of state files: “remote” and “local”. While the “local” state is great for an isolated developer, the “remote” state is quite indispensable for a team, as each member will need to share the infrastructure state whenever there is a change.

Terraform compares those changes with the state file to determine what changes result in a new resource or resource modifications. Terraform stores the state about our managed infrastructure and configuration. This state is used by Terraform to map real-world resources to our configuration, keep track of metadata, and to improve performance for large infrastructures.

TERRAFORM IMPORT

The terraform import command is used to import existing infrastructure. This allows you to take resources you’ve created by some other means and bring it under Terraform management. This is a great way to slowly transition infrastructure to Terraform.

resource “azurerm_resourcegroup .name <subscription_id>{
#instance configuration
}

You want to import the state that already exists, so that the next time you the “apply” command, terraform already knows that the resource exists, and any changes made going forward will be picked up as modifications.

79% of analytics users encounter data questions they can’t solve each month. How are you helping your team to uncover insights from data?

Moving data isn’t always easy. There are more than 340 types of databases in use today and moving data across them presents challenges for any IT team. At CloudIQ Technologies, we have years of experience helping businesses find the IT solutions that can keep up with their constantly evolving business practice. Whether you need a solution for storage, data transfer, or just need to gain better insights from your data, we can help.

KEXP is known internationally for their music and authenticity. To help bring their global audience the music they want, KEXP needed to find a solution that could bring all of their services online. While they eventually accomplished their goal, they did run into roadblocks along the way.

Your partners at CloudIQ Technologies and Microsoft can help you overcome any obstacle. With years of industry experience, you can be rest assured knowing your custom IT solution will be up and running in no time. Contact us today to find out more on how we can help.

Cloud security is a major challenge for organizations running mission critical applications on cloud. One of the biggest risks from hackers come via open ports, and Microsoft Azure Security Center provides a great option to manage this threat – Just-in-Time VM access!

What is Just-in-Time VM access?

With Just-in-Time VM access, you can define what VM and what ports can be opened and controlled and for how long. The Just-in-Time access locks down and limits the ports of Azure virtual machines in order to overcome malicious attacks on the virtual machine, therefore only providing access to a port for a limited amount of time. Basically, you block all inbound traffic at the network level.

When Just-In-Time access is enabled, every user’s request for access will be routed through Azure RBAC, and access will be granted only to users with the right credentials. Once a request is approved, the Security Center automatically configures the NSGs to allow inbound traffic to these ports – only for the requested amount of time, after which it restores the NSGs to their previous states.

The just-in-time option is available only for the standard security center tier and is only applicable for VMs deployed via Azure resource manager.

What are the permissions needed to configure and use JIT?
To enable a user to:Permissions to set
Configure or edit a JIT policy for a VMAssign these actions to the role:
On the scope of a subscription or resource group that
is associated with the VM: Microsoft.Security/locations/jitNetworkAccessPolicies/write
On the scope of a subscription or resource group of VM: Microsoft.Compute/virtualMachines/write
Request JIT access to a VMAssign these actions to the user:
On the scope of a subscription or resource group that is associated with the VM:
Microsoft.Security/locations/jitNetworkAccessPolicies/initiate/action
On the scope of a subscription or resource group that is associated with the VM:
Microsoft.Security/locations/jitNetworkAccessPolicies/*/read
On the scope of a subscription or resource group or VM: Microsoft.Compute/virtualMachines/read
On the scope of a subscription or resource group or VM:
Microsoft.Network/networkInterfaces/*/read

Why Just-in-Time access?

Consider the scenario where a virtual machine is deployed in Azure, and the management port is opened for all IP addresses all the time. This leaves the VM open for brute force attack.

The brute force attack is usually targeted to Management ports like SSH (22) and RDP (3389). If the attacker compromises the security, the whole VM will be open to them. Even though we might have NSG firewalls enabled in our Azure infrastructure, it’s best to limit the exposure of management port within the team for a limited amount of time.

How to enable JIT?

Just-In-Time access can be implemented in two ways,

  1. Go to Azure security center and click on Just-in-Time VM

2. Go to the VM, then click on configuration and “Enable JIT”

How to set up port restrictions?

Go to Azure security center and click on recommendations for Compute &Apps.

Select the VM and click Enable JIT on VMs.

It will then show a list of recommended ports. It is possible to add additional ports as per requirement. The default port list is show below.

Now click on the port that you wish to restrict. A new tab will appear with information on the protocol to be allowed, allowed source IP (per IP address, or a CIDR range).

The main thing to note is the request time. The default time is 3 hours; it can be increased or decreased as per the requirement. Then click, OK.

Click OK and the VM will appear in the Just-in-Time access window in the security center.

What changes will happen in the infrastructure when JIT is enabled?

The Azure security center will create a new Deny rule with a priority less than the original Management port’s Allow rule in the Network security group’s Inbound security rule.

If the VM is behind an Azure firewall, the same rule overwrite occurs in the Azure firewall as well.

How to connect to the JIT enabled VM?

Go to Azure security center and navigate to Just-in-Time access.

Select the VM that you need to access and click on “Request Access”

This will take you to the next page where extra details need to be provided for connectivity such as,

  1. Click ON Toggle
  2. Provide Allowed IP ranges
  3. Select time range
  4. Provide a justification for VM Access
  5. Click on Open Ports

This process will overwrite the NSG Deny rule and create a new Allow rule with less priority than the Deny All inbound rule or the selected port.

The above-mentioned connectivity process includes two things

  1. IP Range
  2. My IP options

IP Range:

In this option, we can provide either a single IP or a CIDR block.

MY IP:

Case 1: If you’re connected to Azure via a public network, i.e., without any IP sec tunnel, while selecting MY IP, the IP address that is to be registered will be the Public IP address of the device you’re connecting from.

Case 2: If you’re connected to Azure via a VPN/ IP Sec tunnel/ VNET Gateway, you can’t possibly use MY IP option. Since the MY IP option directly captures the Public IP and it can’t be used. In this scenario, we need to provide the private IP address of VPN gateway for a single user, or to allow a group of users, provide private IP CIDR Block of the whole organization.

How to monitor who’s requested the access?

The users who request access are registered as Activity. To view the activities in Log analytics workspace, link the Subscription Activity to Log analytics.

It is possible to view the list of users accessed in the log analytics workspace with the help of Kusto Query Language(KQL), once it is configured to send an activity log to log analytics.

In September 2019, Azure announced a brand-new service – Azure Private Link, a very important tool for service providers providing a mix of Azure IaaS and PaaS services.

Azure Private Link enables you to access Azure PaaS Services (for example, Azure Storage and SQL Database) and Azure-hosted customer-owned/partner services over a Private Endpoint in your virtual network. Traffic between your virtual network and the service traverses over the Microsoft backbone network, eliminating exposure from the public Internet. It can be used via a local IP address (on Azure and from on-premises networks) or via a dedicated Azure ExpressRoute network.

Well, naturally, the first benefit is security!  It reduces the exposure of PaaS services to the Internet and provides a secure way to manage traffic between the client’s network and Azure. With Private Link Service, data stays within Microsoft’s system and the client’s private network.

For service providers and their clients, this is obviously critical as it provides secure access to customers in their virtual network while giving them the ability to use the resources in the service provider’s subscription.

Find out how a Private Link Service can be created behind a standard load balancer.

In the example below, Kubernetes Ingress Service is exposed as a Private Link Service. The ingress has a Standard Load Balancer with IP Address 172.17.1.100.

Details of Ingress Service (Internal Load Balancer) 

cloudiq@hubandspoke:~$ kubectl get service -A | grep  LoadBalancer
dev                ciq-demo-ingress-nginx-ingress-
controller        LoadBalancer   192.168.3.11    172.17.1.100     80:32314/TCP,443:30694/TCP   43h

Service can be accessed as below from within the VNET(ciq-demo-vnet)

http://172.17.1.100/web/api/imageresult
Added this method for testing this API in API-MGMT. The current time is : 02/20/2020 10:07:23

The private Link service is created with the following details.

  • Alias – It is a unique URI identifying the service and can be accessed from anywhere within Azure.
  • NAT IP – This determines the Source IP and Destination IP of incoming and outgoing packets to the Private Link service, respectively. This NAT IP can be within any subnet of the service provider VNET

Next, you create a private endpoint in the consumer vnet/subnet. In our example, we have created a network interface in the ciq-devops-general-rq-vnet/default vnet/subnet. The private ip within the vnet/subnet is 10.0.0.4. The Kubernetes ingress service can be accessed from the consumer vnet using the 10.0.0.4 private IP.

cloudiq@cloudiq-build-agent-vm:~$ curl http://10.0.0.4/web/api/imageresult
Added this method for testing this API in API-MGMT. The current time is : 02/20/2020 10:09:03

Private Link can be enabled for other Azure Resources, such as below.

For example, the private endpoint was enabled for a Storage account.

cloudiq@cloudiq-build-agent-vm:~$ curl http://k8sworkshopstg.blob.core.windows.net/test/hw.txt

Hello World!

cloudiq@cloudiq-build-agent-vm:~$ nslookup k8sworkshopstg.blob.core.windows.net
Server:         127.0.0.53
Address:        127.0.0.53#53

Non-authoritative answer:
k8sworkshopstg.blob.core.windows.net    canonical name = k8sworkshopstg.privatelink.blob.core.windows.net.
Name:   k8sworkshopstg.privatelink.blob.core.windows.net
Address: 10.0.0.5

cloudiq@cloudiq-build-agent-vm:~$ curl http://k8sworkshopstg.privatelink.blob.core.windows.net/test/hw.txt

Hello World!

Welcome to Cloud View!

This week we look at why 2020 will be the launch of the ‘Data Decade’, top CX trends, general availability of Azure Sphere, and troubleshooting common problems in Kubernetes Deployments.

Industry News & Perspectives

Launch of the “Data Decade”

CRN asked nearly 80 CEOs five questions about how digital technologies will shape up in 2020 and beyond. Here is a summary of what they said.

Customer experience is no longer just the responsibility of client-facing departments. As more customers shop online, CX has become a top priority for CIOs as well. Here are 5 technology trends that should be a part of every CIO’s strategy.


Technology Updates

Azure Sphere

From its inception in Microsoft Research to general availability today, Azure Sphere is Microsoft’s answer to these escalating IoT threats. An interview with Galen Hunt, distinguished engineer and product leader of Azure Sphere.


DevOps and Agility

As DevOps becomes mainstream and with a range of frameworks to choose from, is DevOps losing its agility? Maybe, maybe not! Here is an article that will reconnect you to the grounding principle of DevOps – innovation, and agility.


Microsoft Datacenters in Spain

Microsoft announces its strategy for establishing new European datacenters in Spain. While there is no fixed launch date, the company has announced that the proposed DCs will deliver Azure, Microsoft 365, Dynamics 365, and the Power Platform.


From CloudIQ

Optimizing Azure Cosmos DB Performance

Azure Cosmos DB allows Azure platform users to elastically and independently scale throughput and storage across any number of Azure regions worldwide. Here is an article on how to optimize Cosmos DB performance.

How to Debug and Troubleshoot Common Problems in Kubernetes Deployments

Kubernetes deployment issues are not always easy to troubleshoot. In some cases, the errors can be resolved easily, and, in some cases, detecting errors requires us to dig deeper and run various commands to identify and resolve the issues. Here is a guided tutorial to debug applications that are deployed into Kubernetes.

Welcome to Cloud View!

This week we look at Microsoft’s Azure strategy, Gartner’s getting smarter about digital business, DevSecOps, debugging Kubernetes applications and a deep dive into Kubernetes networking.

Industry Viewpoints & News

Smarter with Gartner

Kickstart the week with this incredibly ‘smart’ article by Smarter with Gartner. The article sounds a warning about sticking to the old ways of doing digital.

Microsoft’s Azure Strategy

Gavriella Schuster, Microsoft’s Channel Chief, sat down for an illuminating chat with CRN. The conversation revealed Microsoft’s Azure strategy, its channel investment priorities, and its upcoming plans for its partners.

Google Cloud acquires Cornerstone

Google Cloud just finalized another big acquisition with Cornerstone Technology, a mainframe specialist. The new purchase fits in perfectly with Google Cloud’s strategy to make the shifting of legacy applications on to the cloud easier.


Technical Insights 

DevSecOps

When paired together, Security and DevOps can offer organizations more robust and baked-in security. Find out how companies can do DevSecOps correctly in this two-part series by Devops.com.


Azure Firewall Manager

Microsoft extends Azure Firewall Manager preview to include automatic deployment and central security policy management for Azure Firewall in hub virtual networks.


Debugging a Kubernetes Application

General troubleshooting and debugging techniques for an application running in a Kubernetes environment and the most common issues to expect.


From CloudIQ

Kubernetes Networking Deep Dive – Data Plane, how it Works Under the Hood?

Kubernetes is simple enough to get started, however one of the most complex and critical part is the networking. Here is a deep dive into the Data Plane and how it works under the hood.

Microservices – Aligning business and technology for closer collaboration, agility & flexibility

A microservices-based architecture introduces agility, flexibility and supports a sustainable DEVOPS culture ensuring closer collaboration within businesses and the news is that it’s happening for those who embrace it.

Welcome to Cloud View!

This week we look at Oracle Cloud Data Science Platform, GKE support for Windows, DevSecOps and making better quality software using Jenkins CI/CD pipeline.

Industry Perspective and News

Oracle Cloud Data Science Platform

Oracle is pulling out all stops in 2020! The company is in the news once again with the launch of its data science platform – a first of its kind cloud-native platform that is completely geared towards providing a collaborative workspace for data scientists.

Windows on GKE

Google Cloud supports Windows on GKE, as a part its commitment to providing complete support to client’s Windows server-based applications.

Six I’s of Successful IT Leaders

As every company becomes a tech company, the role of the CIOs become critical for success. Naturally, this increases the pressure on the CIOs to deliver more and think strategically. Here is an article that outlines 6 key focus areas that CIOs can start with.


Technological Insights

Storage is getting a reboot!

IBM is the first one to revamp its storage lines with the launch of Storwize and Flash Systems A9000.


DevSecOps

Security as a topic is never far off in any IT discussion. A useful webinar by CEO of Cmd, Jake King, on DevOpsTV. He lays out 7 DevOps-friendly techniques that will help you incorporate security without compromising on speed or scale.


CockroachDB

A fantastic chat with Peter Mattis, the creator of the CockroachDB open-source database and co-founder and CTO of Cockroach Labs. A conversation that covers interesting bits from his career in open source and Google.


From CloudIQ

Make Better Quality Software using Jenkins for your CI/CD Pipeline

With Jenkins, organizations can accelerate the software development process through automation. Jenkins integrates development life-cycle processes of all kinds, including build, document, test, package, stage, deploy, static analysis and much more.

Blue Green Deployment on Azure, Safe Strategy with Zero Downtime

When you are deploying a new change into production, the associated deployment should be in a predictable manner. In simple terms, this means no disruption and zero downtime!

Welcome to Cloud View!

This week we look at the State of DevOps, Hyperledger Fabric on AKS Marketplace template, Serverless computing frameworks and InfluxDB on GCP.

Industry News & Perspectives

Robot Resource Organizations

The adoption of new digital technologies and the ever-changing expectations of customers continues to challenge traditional retailers, forcing them to investigate new-human hybrid operational models, including artificial intelligence (AI), automation and robotics.

Oracle Cloud

While the cloud market pie is divided into 3-4 large slices, there is still plenty of business in the thin sliver left for ‘others’. Oracle Cloud is doing its best to capture this small market and maybe become more than a niche infrastructure provider.

State of DevOps

The 8th annual ‘State of DevOps’ survey reports that the retail sector (as always) is the most advanced when it comes to DevOps adoption. Read on for more details.


Technical Insights

Hyperledger Fabric on AKS Marketplace template

Users with little knowledge of Azure or Hyperledger Fabric will now be able to easily set up a blockchain consortium on Azure with the new Hyperledger Fabric on Azure Kubernetes Service marketplace template.


Serverless computing frameworks

A new report by Datadog shows that almost 50% of the companies using its platform are opting for AWS Lambda serverless computing framework.


InfluxDB on GCP

Database provider InfluxDB is now live on GCP. It announced the availability of its managed cloud service as a part of Google Cloud’s open-source umbrella. Next on the agenda is the rollout of its second-generation serverless offering on Azure.


From CloudIQ

Develop Faster with Continuous Integration & the Tools to Get the Job Done

In recent years CI has become a best practice for software development and is guided by a set of key principles. Among them are revision control, build automation and automated testing.

Installing and Using HELM, the Package Manager for Kubernetes

Helm is a package manager for Kubernetes that allows developers and operators to more easily package, configure, and deploy applications and services onto Kubernetes clusters.

Welcome to Cloud View!

This week we look at Stackshare’s top 140 tools for developers in 2019, auto-labelling tool for AI developers, using Terraform for multi-cloud orchestration strategy and more.

Industry Perspective & News

Launchable aims to increase delivery velocity

This is the best use of any extra 25 minutes you have today. Jenkins founder Kohsuke Kawaguchi (KK) and respected DevOps veteran Harpreet Singh have launched a new company called Launchable that aims to increase delivery velocity. They talk all about it in this podcast (if you prefer to read, then the link has a complete transcript too).  

AWS maintains market share

January ended with the news of AWS posting revenues of $9.95 billion in Q4 2019 – bringing its 2019 revenue up to a grand total of $40bn. But, as we say, “cloud is complex,” and there is plenty of big fish in the cloud market.


Tech Insights

Top 140 tools for developers in 2019

Here is an article every developer MUST bookmark. Stackshareanalyzed over four million data points shared in their community and shortlisted the definitive list of Top 140 tools for developers in 2019!


Auto-labelling tool for AI developers

After that mammoth list, we decided to continue with the tool theme. So here is another one – a new auto-labeling tool for AI developers by IBM.


Cisco HyperFlex Application Platform (HXAP)

Cisco launched a tool (or rather a whole bouquet of tools) that lets customers build their own cloud-native environments. The HyperFlex Application Platform (HXAP) offers a whole host of integrated tools such as container networking, storage, a load balancer, and more.


From CloudIQ

Kubernetes on Azure: A 2-day workshop for AKS developers

Container technology has revolutionized the DevOps landscape and offers organizations the chance to develop and test applications faster and more cost-effectively. CloudIQ’s 2-day hands-on workshop is designed to give DevOps team members the opportunity to skill-up and learn Kubernetes design, deployment, and management.

Terraform for Multi-Cloud Orchestration Strategy

Terraform being cloud-agnostic, allows a single configuration to be used to manage multiple providers, and to even handle cross-cloud dependencies by simplifying management and orchestration.

There are many DevOps lifecycle tools out there, however GitLab is a complete package designed for coordinating CI/CD pipelines.

GitLab is a web-based DevOps lifecycle tool. This application offers functionality to automate the entire DevOps life cycle from planning to creation, build, verify, security testing, deploying, and monitoring, offering high availability and replication. It is highly scalable and can be used on-prem or on the cloud. GitLab also includes a wiki, issue-tracking, and CI/CD pipeline features.

When DevOps projects are spread across large, geographically dispersed teams a complete DevOps tool is highly useful to maintain collaboration, incorporate feedback, avoid mistakes, and speed up the development process.

GitLab goes beyond being just a repository manager; it has a built-in CI/CD, which saves enormous amounts of time and keeps the workflow smooth. Along with its own CI/CD, GitLab also allows for a range of 3rd party integrations with external CI, so you always have the option of working with the tools based on your workflow. 

Here is a quick run-through on how to start with GitLab.

Project:

In GitLab, we can create projects for hosting codebase, use it as an issue tracker, collaborate on code, and continuously build, test, and deploy apps with built-in GitLab CI/CD. Projects can be available publicly, internally, or privately, at our choice. GitLab does not limit the number of private projects we create.

Create a project in GitLab

In the dashboard, click the green “New project” button or use the plus icon in the navigation bar. This opens the New Project page.

On the New Project page :

  • Create a Blank project
  • Fill the name of your project in the Project name field
  • Project URL Field which is the URL path for the project that the GitLab instance will use
  • Project slug field will be auto-populated
  • The Project description (optional) field enables you to enter a description for the project’s dashboard
  • Changing the Visibility Level modifies the project’s viewing and access rights for users
  • Selecting the Initialize repository with a README option creates a README file so that the Git repository is initialized, has a default branch, and can be cloned
  • Click Create project

Repository

A repository is a part of a project, which has a lot of other features.

Host your codebase in GitLab repositories by pushing files to GitLab. You can either use the user interface (UI) or connect your local computer with GitLab through the command line.

GitLab Basic Commands: https://docs.gitlab.com/ee/gitlab-basics/command-line-commands.html

Branch

When you create a new project, GitLab sets the master as the default branch for your project. You can choose another branch to be your project’s default under your project’s Settings > Repository.

Commits

When you commit your changes, you are introducing those changes to your branch. Via a command line, you can commit multiple times before pushing.

A commit message is important to identify what is being changed and, more importantly, why. In GitLab, you can add keywords to the commit message that will perform one of the actions below:

  • Trigger a GitLab CI/CD pipeline: If you have your project configured with GitLab CI/CD, you will trigger a pipeline per push, not per commit.
  • Skip pipelines: You can add to you commit message the keyword [ci skip], and GitLab CI will skip that pipeline.
  • Cross-link issues and merge requests: Cross-linking is great to keep track of what’s somehow related in your workflow. If you mention an issue or a merge request in a commit message, they will be shown on their respective thread.

CI/CD Pipeline

Continuous Integration works by pushing small code chunks to your application’s code base hosted in a Git repository, and, to every push, run a pipeline of scripts to build, test, and validate the code changes before merging them into the main branch.

 Continuous Delivery and Deployment consist of a step further CI, deploying your application to production at every push to the default branch of the repository.

These methodologies allow you to catch bugs and errors early in the development cycle, ensuring that all the code deployed to production complies with the code standards you established for your app.

Two top-level components are:

  1. .gitlab-ci.yml
  2. GitLab Runner
.gitlab-ci.yml

The .gitlab-ci.yml file is where we configure what CI does with the project. It lives in the root of the repository. On any push to the repository, GitLab will look for the .gitlab-ci.yml file and start jobs on Runners according to the contents of the file, for that commit.

Pipeline configuration begins with jobs. Jobs are the most fundamental element of a .gitlab-ci.yml file.

Jobs are:

  • Defined with constraints stating under what conditions they should be executed
  • Top-level elements with an arbitrary name and must contain at least the script clause
  • Not limited in how many can be defined

For example:

job1:
  script: “execute-script-for-job1”

job2:
  script: “execute-script-for-job2”

GitLab Runner

GitLab Runner is a build instance that is used to run jobs and send the results back to GitLab. It is designed to run on the GNU/Linux, macOS, and Windows operating systems.

Runners run the code defined in .gitlab-ci.yml. They are isolated (virtual) machines that pick-up jobs through the coordinator API of GitLab CI. If we want to use Docker, install the latest version. GitLab Runner requires a minimum of Docker v1.13.0.

Types of Runner :
  1. Specific Runner – useful for jobs that have special requirements or for projects with a specific demand.
  2. Shared Runner – useful for jobs that have similar requirements between multiple projects. Rather than having multiple Runners idling for many projects, you can have a single or a small number of Runners that handle multiple projects.
  3. Group Runner – useful when you have multiple projects under one group and would like all projects to have access to a set of Runners. Group Runners process jobs using a FIFO (First In, First Out) queue.
How to use Runner:
  1. Install Runner  –  https://docs.gitlab.com/runner/#install-gitlab-runner
  2. Register Runner – https://docs.gitlab.com/runner/register/index.html

Sample Docker Project:

  1. Create/Upload 2 files. .gitlab-ci.yml and Dockerfile
File Contents:
.gitlab-ci.yml:

# Official docker image.
image: docker:19.03.0-dind

services:
  - docker:19.03.0-dind

before_script:
  - docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" $CI_REGISTRY

build-master:
  stage: build
  script:
    - docker build --pull -t "$CI_REGISTRY_IMAGE" .
    - docker push "$CI_REGISTRY_IMAGE"
  tags:
    - docker

Dockerfile

FROM python:2.7
RUN pip install howdoi
CMD ['howdoi']

Once we create .gitlab-ci.yml file, each push will trigger the pipeline. We didn’t create the Runner yet. So, while creating these files, add “[ci skip]” to commit message. This will skip the CI/CD pipeline.

2. Install Runner

For this example, we are using a specific runner and are going to install a runner in Windows. Refer to this link to Install Runner in Windows: https://docs.gitlab.com/runner/install/windows.html

3. Register Runner:

In order to register a runner, we need a registration token which can be found in Settings > CI/CD > Runner Tab

Check below Specific Runner section. You can copy the token from there.

To register a Runner under Windows run the following command in the path where we install GitLab Runner:
./gitlab-runner.exe register

Enter your GitLab instance URL:
Please enter the gitlab-ci coordinator URL (e.g. https://gitlab.com)
https://gitlab.com

Enter the token you obtained to register the Runner:
Please enter the gitlab-ci token that we copied earlier

Enter a description for the Runner, you can change this later in GitLab’s UI:
Please enter the gitlab-ci description for this Runner
[hostname] my-runner

Enter the tags associated with the Runner, you can change this later in GitLab’s UI:
Please enter the gitlab-ci tags for this Runner (comma separated):
docker

Enter the Runner executor:
Please enter the executor: ssh, docker+machine, docker-ssh+machine, kubernetes, Docker, parallels, virtualbox, docker-ssh, shell:
docker

If you chose Docker as your executor, you’ll be asked for the default image to be used for projects that do not define one in .gitlab-ci.yml:
Please enter the Docker image (eg. ruby:2.6):
python:2.7

Once the Runner is created successfully, it will be displayed under Settings > CI/CD > Runner > Specific Runner section.

4. Now, Go to CI/CD > Pipelines and Click  Run Pipeline Button

It will open a new window. Click Run Pipeline button again.

5. Once the job completed successfully, it will be displayed as below.

Welcome to Cloud View!

This week we look at some of the infrastructure and operations trends and insights on MLOps, Amazon Alexa and Blue green deployment on Azure.

Industry Perspective & News

AWS slashes cost of DR

Celebrate! AWS has announced massive cloud cost reductions on disaster recovery and Kubernetes services

Another week, another bunch of predictions. The difference here is that the article also details some trends that are on the wane.  


Tech Insights

MongoDB Supports GraphQL

MongoDB is now supporting GraphQL language for accessing its serverless application platform. This promises to extend their technology towards web and mobile apps.


CircleCI Orbs

In just a year since its launch, CircleCI Orbs are being used by over 13,000 organizations in close to 9 million CI/CD pipelines. New collaborations with 20 partners now extend the Orbs ecosystem even further.


MLOps

MLOps – the collaborative best practices that accelerate the machine lifecycle across model development, deployment, monitoring, and more – can give organizations a massive edge against competitors. John ‘JG’ Chirapurath, General Manager, Azure Data & AI explains more in this blog.


From CloudIQ

Amazon Alexa Custom Skills – How to Build One Step-by-Step

Amazon’s Alexa is the voice activated and interactive AI Bot designed to respond to number of commands and converse with people. Alexa Skills are apps that give Alexa even more abilities.

Blue Green Deployment on Azure, Safe Strategy with Zero Downtime

In Azure, different processes are available for implementing the Blue-Green strategy with two environments. In this article we discuss some of these techniques.

Cybersecurity is the number one concern for CEOs and is unanimously seen as the biggest threat in the coming years. Reports suggest that the damages from cyberattacks will to amount to $6 trillion annually by 2021.

While a lot of news coverage is given to malicious hackers and ransomware attacks, another crucial area of cyber protection is tightening the internal defenses with intelligent identity management. Keeping a tight control on who can get past your firewalls is vital for maintaining optimum security.

In this article we will review the comprehensive set of security tools available in Azure Cloud.

Azure Active Directory

Multi-Factor Authentication

Azure Multi-Factor Authentication (MFA) helps safeguard access to data and applications while maintaining simplicity for users. It provides additional security by requiring a second form of authentication and delivers strong authentication via a range of easy to use authentication methods. Users may or may not be challenged for MFA based on configuration decisions that an administrator makes.

The security of two-step verification lies in its layered approach. Compromising multiple authentication factors presents a significant challenge for attackers. Even if an attacker manages to learn the user’s password, it is useless without also having possession of the additional authentication method. It works by requiring two or more of the following authentication methods: Something you know (typically a password), Something you have (a trusted device that is not easily duplicated, like a phone), Something you are (biometrics)

Conditional Access policies

Conditional Access is the tool used by Azure Active Directory to bring signals together, to make decisions, and enforce organizational policies. Conditional Access policies at their simplest are if-then statements; if a user wants to access a resource, then they must complete an action. Example: A payroll manager wants to access the payroll application and is required to perform multi-factor authentication to access it.

Azure AD identity protection

Identity Protection is a tool that allows organizations to accomplish three key tasks: Automate the detection and remediation of identity-based risks, Investigate risks using data in the portal, Export risk detection data to third-party utilities for further analysis. The signals generated by and fed to Identity Protection, can be further fed into tools like Conditional Access to make access decisions, or fed back to a security information and event management (SIEM) tool for further investigation based on your organization’s enforced policies.

Azure AD Privileged Identity Management

Privileged Identity Management provides time-based and approval-based role activation to mitigate the risks of excessive, unnecessary, or misused access permissions on resources that you care about. Here are some of the key features of Privileged Identity Management:

  • Provide just-in-time privileged access to Azure AD and Azure resources
  • Assign time-bound access to resources using start and end dates
  • Require approval to activate privileged roles
  • Enforce multi-factor authentication to activate any role
  • Use justification to understand why users activate
  • Get notifications when privileged roles are activated
  • Conduct access reviews to ensure users still need roles
  • Download audit history for internal or external audit

Network Security

Network Security Groups (NSGs)

Network security group security rules are evaluated by priority using the 5-tuple information (source, source port, destination, destination port, and protocol) to allow or deny the traffic. A flow record is created for existing connections. Communication is allowed or denied based on the connection state of the flow record. The flow record allows a network security group to be stateful. If you specify an outbound security rule to any address over port 80, for example, it’s not necessary to specify an inbound security rule for the response to the outbound traffic. You only need to specify an inbound security rule if communication is initiated externally. The opposite is also true. If inbound traffic is allowed over a port, it’s not necessary to specify an outbound security rule to respond to traffic over the port. Existing connections may not be interrupted when you remove a security rule that enabled the flow. Traffic flows are interrupted when connections are stopped, and no traffic is flowing in either direction, for at least a few minutes.

Azure Firewall

With Azure Firewall, you can configure – Application rules that define fully qualified domain names (FQDNs) that can be accessed from a subnet and Network rules that define source address, protocol, destination port, and destination address. Network traffic is subjected to the configured firewall rules when you route your network traffic to the firewall as the subnet default gateway.

Application security groups

Application security groups enable you to configure network security as a natural extension of an application’s structure, allowing you to group virtual machines and define network security policies based on those groups. You can reuse your security policy at scale without manual maintenance of explicit IP addresses. The platform handles the complexity of explicit IP addresses and multiple rule sets, allowing you to focus on your business logic.

Resource management security

Azure resource locks

As an administrator, you may need to lock a subscription, resource group, or resource to prevent other users in your organization from accidentally deleting or modifying critical resources. You can set the lock level to CanNotDelete or ReadOnly. In the portal, the locks are called Delete and Read-only, respectively. CanNotDelete means authorized users can still read and modify a resource, but they can’t delete the resource. ReadOnly means authorized users can read a resource, but they can’t delete or update the resource. Applying this lock is similar to restricting all authorized users to the permissions granted by the Reader role.

Azure policies

Azure Policy is a service in Azure that you use to create, assign, and manage policies. These policies enforce different rules and effects over your resources, so those resources stay compliant with your corporate standards and service level agreements. Azure Policy meets this need by evaluating your resources for non-compliance with assigned policies. All data stored by Azure Policy is encrypted at rest. For example, you can have a policy to allow only a certain SKU size of virtual machines in your environment. Once this policy is implemented, new and existing resources are evaluated for compliance.

Custom RBAC roles

Granting permission using custom Azure AD roles is a two-step process that involves creating a custom role definition and then assigning it using a role assignment. A custom role definition is a collection of permissions that you add from a preset list. These permissions are the same permissions used in the built-in roles.

Once youíve created your role definition, you can assign it to a user by creating a role assignment. A role assignment grants the user the permissions in a role definition at a specified scope. This two-step process allows you to create a single role definition and assign it many times at different scopes. A scope defines the set of Azure AD resources the role member has access to.

Encryption for data at rest

Azure SQL Database Always Encrypted

Always Encrypted is a new data encryption technology in Azure SQL Database and SQL Server that helps protect sensitive data at rest on the server during movement between client and server, and while the data is in use, ensuring that sensitive data never appears as plaintext inside the database system. After you encrypt data, only client applications or app servers that have access to the keys can access plaintext data.

Implement database encryption

Transparent data encryption (TDE) helps protect Azure SQL Database, Azure SQL Managed Instance, and Azure Data Warehouse against the threat of malicious offline activity by encrypting data at rest. It performs real-time encryption and decryption of the database, associated backups, and transaction log files at rest without requiring changes to the application. By default, TDE is enabled for all newly deployed Azure SQL databases.

Implement Storage Service Encryption

Data in Azure Storage is encrypted and decrypted transparently using 256-bit AES encryption, one of the strongest block ciphers available, and is FIPS 140-2 compliant. Azure Storage encryption is similar to BitLocker encryption on Windows.

Azure Storage encryption is enabled for all new storage accounts, including both Resource Manager and classic storage accounts. Azure Storage encryption cannot be disabled. Because your data is secured by default, you don’t need to modify your code or applications to take advantage of Azure Storage encryption.

Implement disk encryption

Azure Disk Encryption helps protect and safeguard your data to meet your organizational security and compliance commitments. It uses the Bitlocker feature of Windows to provide volume encryption for the OS and data disks of Azure virtual machines (VMs), and is integrated with Azure Key Vault to help you control and manage the disk encryption keys and secrets.

Configure application security

Configure SSL/TLS certs

If you purchase an App Service Certificate from Azure, Azure manages the following tasks: Takes care of the purchase process from GoDaddy, Performs domain verification of the certificate, Maintains the certificate in Azure Key Vault, Manages certificate renewal (see Renew certificate), Synchronize the certificate automatically with the imported copies in App Service apps.

Configure and Manage Key Vault

Manage access to Key Vault

Azure Key Vault is a cloud service that safeguards encryption keys and secrets like certificates, connection strings, and passwords. Because this data is sensitive and business-critical, you need to secure access to your key vaults by allowing only authorized applications and users.

Access to a key vault is controlled through two interfaces: the management plane and the data plane. The management plane is where you manage Key Vault itself. Operations in this plane include creating and deleting key vaults, retrieving Key Vault properties, and updating access policies. The data plane is where you work with the data stored in a key vault. You can add, delete, and modify keys, secrets, and certificates.

To access a key vault in either plane, all callers (users or applications) must have proper authentication and authorization. Authentication establishes the identity of the caller. Authorization determines which operations the caller can execute.

Both planes use Azure Active Directory (Azure AD) for authentication. For authorization, the management plane uses role-based access control (RBAC), and the data plane uses a Key Vault access policy.

Welcome to Cloud View!

This week we look at the worldwide IT spending for 2020, aligning IoT and AI with business goals and the Kubernetes bug bounty program.

Tech Insights

IT Spending 2020

Gartner starts the year with some great news – Worldwide IT spending is projected to increase by 3.4% to $3.9 trillion in 2020. Get more insights from the Gartner report here.

Adoption of IoT and AI

To get the full benefits of IoT and AI, their adoption will need to merge with business strategies and goals. And that will change the way businesses are structured.


Industry Insights

Kubernetes-based Red Hat OpenShift 4.3 

Red Hat’s cloud-native commitment stays strong! The company just released its Kubernetes-based Red Hat OpenShift 4.3 and Red Hat OpenShift Container Storage 4 to provide multi-cloud Kubernetes container support.


Kubernetes bug bounty program

Kubernetes Product security committee is launching a new bug bounty program to tap into the power of the highly active Kubernetes community to find vulnerabilities in the software. Find out how you can get started.


From CloudIQ

Introduction to Machine Learning and How It Works

In this article we look at the basics of machine learning, the different algorithm models and a simple machine learning algorithm example using Python.

How to build real-time streaming data pipelines and applications using Apache Kafka?

In this article we will discuss how to use Apache Kafka, the distributed publish-subscribe messaging system and to pass messages from one end-point to another.

Welcome to Cloud View!

This week we look at some of the technologies of the future and insights on container performance & security, connected vehicles, and chatbots.

Latest in Tech

CES 2020

2020 starts with the biggest tech event of the year – CES 2020. The Las Vegas event showcases the technology of the future! Here are the highlights of some of the enterprise technologies on display.

Technology 2020

Want to know the latest technology trends that will impact businesses in 2020? From the empowered edge to human augmentation and more, here are 15 of them.


Industry Insights

Service Mesh

Containers are dominating the software world, but despite their popularity and orchestration software like Kubernetes, they are still challenging to manage. Service meshes come as the answer to improving container performance and security.


Connected Vehicles

The world of vehicle software is heating up! BlackBerry QNX and AWS are targeting automotive OEMs to bring services, personalization, health monitoring, and advanced driver assistance (ADAS) to vehicles.


Chatbots

In the next couple of years, 70% of white-collar workers will chat with conversation AI platforms daily – predicts Gartner. Here is a case study of improving customer service with an intelligent virtual assistant using IBM Watson.


From CloudIQ

Azure Database for MySQL and Grafana to monitor Azure services

More and more organizations run their business-critical applications on containers using Azure and that calls for a more intuitive dashboard to monitor and track Azure Services.

How to Create and Run Spark Clusters with Qubole using AWS

Qubole is a platform that puts big data on the cloud to power business decisions based on real-time analytics. Here is how you can create and run Spark Clusters with Qubole using AWS.

As more and more organizations run their business-critical applications on containers using Azure, there are new challenges in monitoring and managing them. Of course, there is the Azure dashboard, but with elaborate set-ups and such, IT teams feel the need for a more intuitive dashboard to monitor and track Azure services.

The answer, Grafana.

Grafana is an open-source dashboard and graph editor for Graphite, Elasticsearch, OpenTSDB, Prometheus, and InfluxDB. It is a powerful visualization application that deals effectively with large-scale measurement data and time-series data.

As compared to other dashboards, especially the native Azure dashboard, Grafana offers a wider variety of visualization options (graphs, heatmaps, tables, and more) and can collect and collate data from multiple sources. It is designed for evaluating metrics such as system CPU, memory, disk, and I/O utilization.

A Grafana dashboard will help you understand, analyze, monitor, and explore your data with flexible and fast visualization tools.

In this article we will look at using Azure Database for MySQL and Grafana to monitor Azure services

Access Requirements

In your Azure subscription, your account must have “Microsoft.Authorization/*/Write” access to assign and AD app to a role. This action is granted through “Owner” role or “User Access Administrator” role. “Contributor” role will not have the required permissions.

Virtual machine requirements,
  • VM Operating System      :  Linux (ubuntu 18.04)
  • VM Size                                   :  Standard D2s v3 (2 vcpus, 8 GiB memory) is more than enough
  • SSH access                            :  username and password.
  • Default port                       :  3000
  • NSG rule                                 :  open an inbound rule in network security group with
                                                        limited access to port 3000 and 22 for SSH.
  • Assign a static public IP address to the VM

MySQL Creation and Linking to Grafana

1. Create an Azure database for MySQL server from

2. Select the resource group, provide Server name, admin username, password, confirm password. Take a note of the password; it is used several times throughout the set-up.

3. To select compute and storage,

a. There are three pricing tiers, (choose basic)

  • Basic
  • General-purpose
  • Memory-optimized

b. Select the appropriate sizes.

  • Computer generation: Gen 5
  • vCore: 1
  • storage: 5GB
  • Auto-growth:
  • Backup retention period:
  • Local redundant / Geo-Redundant

For basic compute and storage,

The maximum vcore is 2, and Storage is 1024 GB. Choose as per your needs.

4. Then click Review+Create

5. Once the Azure database for MYSQL server is deployed, go to connection security and do the following changes,

  • Add a client IP.
  • Set “Allow access to Azure services” to ON

6. Do the following in the SQL server by connecting to it using the Server admin login name and password in SQL workbench. Create a new query tab. (You can use any tool to connect MySQL)

7. Run the following commands in the query tab,

  • create database named “grafana” ;

8. Now the SQL server-side configurations are over. We need to provide the inputs of SQL server configuration to docker containers running Grafana.

9. Login to the VM running Grafana using appropriate SSH credentials (password or access keys).

10. Note the following values and save them as environment variables as an environment list.

Type                =          mysql
Host                =          <servername>:3306 (mysql server name created earlier)
Name              =          grafana (DB name given in the earlier steps and given access)
User                =          <Server admin login name>
Password       =          <server login password>

11. Save the changes mentioned above as a list. As shown,

Installing Grafana as a docker container and required its plugins

1. Login to the server using appropriate credentials,

2. Get updates using, sudo apt-get update

3. Install docker using the command, sudo apt install docker.io

4. Enable and start docker,

  • sudo systemctl start docker
  • sudo systemctl enable docker

5. Verify the installation using the command,

  • docker –version the result will be like this,

6. Now login as root using the command,

  • sudo su

7. Pull Grafana image; this needs an Internet connection as this will download the image from a public hub of docker

  • docker pull grafana/grafana

8. Run the image with saved environment variables,

  • docker run -d –name=grafana -p 8000:3000 –env-file ./env.list grafana/grafana

9. verify the container installation using “docker ps” command

10. The next step is to install the plugins for Grafana, which will be used in setting up the dashboard. We need to login to the container created previously to install these plugins.

11. Now create a shell inside the container using,

  • docker exec -it grafana /bin/bash

12. The result will be as shown,

13. By default, in Grafana dashboard there’ll be limited number of panel plugins, to use more visualization we can manually install plugins. Now copy the plugin installation commands listed below and run them one by one or everything at once.

  • grafana-cli plugins install michaeldmoore-annunciator-panel
  • grafana-cli plugins install grafana-piechart-panel
  • grafana-cli plugins install farski-blendstat-panel
  • grafana-cli plugins install michaeldmoore-multistat-panel
  • grafana-cli plugins install grafana-polystat-panel
  • grafana-cli plugins install flant-statusmap-panel
  • grafana-cli plugins install grafana-clock-panel
  • grafana-cli plugins install neocat-cal-heatmap-panel
  • grafana-cli plugins install briangann-gauge-panel
  • grafana-cli plugins install natel-plotly-panel

14. Once the plugins are installed,

  • verify the installation by going into, /var/lib/grafana/plugins directory by using the commands listed below
  • cd  /var/lib/grafana/plugins
  • to view installed plugins, use ls command

15. Now exit the container, command: exit

16. Now restart the container using,

  • docker container restart Grafana (here “grafana” refers to the container name created earlier, which can be found using docker ps command)

Linking Azure Monitoring Tools

Service Principal

  • Register an app in Azure Active Directory (AD).
  • Create a client secret in the Registered app.
  • Go to subscription à IAM à search for the app registration and provide “READER” access to the registered app in the Azure AD in first place.

Applying the service principal to Grafana,

  • Go to Grafana UI by using public IP address followed by port number,
    i.e., (IP address):3000 example: 13.25.49.164:3000
  • Now Add data source. And click select
  • This page will appear, input the tenant ID, client ID, client secret.
  • Then provide details for log analytics workspace and Application insights.
  • If the provided details are correct this message should be displayed.

Welcome to Cloud View!

We hope you had a joyful and fun holiday! NOW with the festivities behind us, it’s time for business again!

Let’s start with what IT Leaders are planning for 2020.

Leader’s Speak

CIO’s plan for 2020

What are the CIOs thinking and planning for the coming year? It seems finding talent, dealing with rising security problems, and prioritizing the acquisition of new technologies are some of the topics occupying the C-suite.

IT Industry in 2020

Joel Friedman, CTO, Rackspace offers his take on what 2020 will hold for the IT industry. According to Joel, hybrid, and multi-cloud, SaaS and security problems will grow;

A decade since the launch of Azure

2020 marks a decade since the launch of Azure. Ever wondered what the founders think about their creation? Here’s a short interview with Microsoft’s Yousef Khalidi and Hoi Vo, key members of the original Azure ‘dream team’.


Industry insights

AI Solutions

As a digital-first service provider, we at CloudIQ help implement AI solutions across organizations. Impact like this is what we aim for! This is what the future of AI looks like.


Top Big Data companies to watch for in 2020

Big Data is going to dominate many a boardroom in the coming few years. We start the year by tracking some promising companies that will define the coming year with their next-generation data management, data science, and machine learning technology.


From CloudIQ

Deploying a Pod containing three applications using Jenkins CI/CD pipeline and updating them selectively

Kubernetes pod is a layer of abstraction wrapped around containers to group them together for resource allocation and efficient management. Here is how to deploy a pod containing three applications using Jenkins CI/CD pipeline and update them selectively.

Provisioning Cloud Infrastructure using AWS CloudFormation Templates

Spend less time managing cloud infrastructure and focus on building your application, thanks to AWS CloudFormation templates. Here is a quick start guide to creating the templates for provisioning cloud infrastructure.

A Kubernetes pod – incidentally, some say it is named after a whale pod because the docker logo is a whale – is the foundational unit of execution in a K8s ecosystem. While docker is the most common container runtime, pods are container agnostic and support other containers as well.

Simply put, a K8s pod is a layer of abstraction wrapped around containers to group them together to allocate resources and to manage them efficiently.

Continuous integration and delivery or CI/CD is a critical part of DevOps ecosystems, especially for cloud-native applications. DevOps teams frequently use Jenkins CI/CD pipelines to increase the speed and quality of collaborated software development ecosystems by adding automation. Thanks to Helm, deploying Jenkins server to K8s is quick and easy. The difficult bit is building the pipeline.

Here is a post that describes how to deploy a pod containing three applications using a Jenkins CI/CD pipeline and update them selectively.

Task on Hand:

Use a Jenkins pipeline to build a spring-boot application to generate jar file, dockerize the application to create a docker image, push the image to a docker repository and pull the image into the pod to create containers. The pod should contain 3 containers for the three applications, respectively. Upon git commit, only the container for which there is a change must be updated (rolling update).

Steps
  1. Create a pipeline using groovy script to clone the respective git repo, build the project using maven, build the docker images, push it to dockerhub and pull these images to run containers in the pod.
  2. Repeat the steps for all the three applications in separate stages. Make sure to create a separate directory in each stage to prevent conflicts when using similar files. Also, this clones the different git repos into different folders to avoid confusion.
  3. Here is the Jenkinsfile/Pipeline script to perform the above task:
pipeline {
agent any
stages {
stage('Build1'){
    steps{
        dir('app1'){
            script{
                git 'https://github.com/cloud/simple-spring.git'
                sh 'mvn clean install'
                app = docker.build("cloud007/simple-spring")
                docker.withRegistry( "https://registry.hub.docker.com", "dockerhub" ) {
                // dockerImage.push()
                app.push("latest")
            }
        }
    }
}

}
stage('Build2'){
    steps{
        dir('app2'){
            script{
                git 'https://github.com/cloud/simple-spring-2.git'
                sh 'mvn clean install'
                app = docker.build("cloud007/simple-spring-2")
                docker.withRegistry( "https://registry.hub.docker.com", "dockerhub" ) {
                // dockerImage.push()
                app.push("latest")
            }
        }
    }
}

}
stage('Build3'){
    steps{
        dir('app3'){
            script{
                git 'https://github.com/cloud/simple-spring-3.git'
                sh 'mvn clean install'
                app = docker.build("cloud007/simple-spring-3")
                docker.withRegistry( "https://registry.hub.docker.com", "dockerhub" ) {
                // dockerImage.push()
                app.push("latest")
            }
        }
    }
}
}
stage('Orchestrate')
{
    steps{
        script{
    sh 'kubectl apply -f demo.yaml'
        }
    }
}

}
}

4. Make sure to properly configure docker and expose the dockerd in port 4243 and then change permission to allow Jenkins to use docker commands by changing permission for the /var/run/docker.sock shown.

5. Coming to integrating Kubernetes with Jenkins, it can be done using two plugins:

  • Kubernetes plugin: When using this plugin, we configure the credentials to use our local cluster/azure cluster and specify the container templates for the containers to be created in the pipeline. But since all the tasks must run in containers, it is a little confusing approach. A better approach would be to use the kuberentes-cli plugin.

Refer: https://github.com/jenkinsci/kubernetes-plugin

  • Kubernetes-cli plugin: It provides a withconfig() for pipeline support, which uses the configure credentials to connect to our cluster and perform kubectl commands. However when running the pipeline, the kubeconfig wasn’t  recognized for some reason, and kept giving the error ‘file not found’.

Refer: https://github.com/jenkinsci/kubernetes-cli-plugin/blob/master/README.md

Hence, we installed kubectl on the Jenkins host, configured the cluster manually, and ran shell commands from the Jenkins pipeline, where Jenkins was recognized as an anonymous user and was only granted get access but couldn’t create pods/deployments.

Here are some common problems faced during this process and the troubleshooting procedure.

  • Configuring Jenkins to use local minikube cluster:
    We had trouble using both the plugins to properly configure Jenkins to create deployments as required. Using shell commands to run kubectl was also not successful since Jenkins was recognized as an anonymous user, and authorization prevented anonymous users from creating deployments.
  • Permission for /var/run/docker.sock is reset to root after every restart, so make sure to change it to allow Jenkins to continue to use docker commands: choose Jenkins /var/run/docker.sock
  • Installing Minikube:
    i) Started minikube cluster using hyperv as the driver and created a virtual    switch:
    minikube start –vm-driver=hyperv  –hyperv-virtual-switch=”Primary Virtual Switch”

    ii) Installation takes a lot of time, so we have to wait patiently, and eventually, the cluster will get configured and started. If there is a problem with apiserver, then stop the machine after SSHing into minikube vm:
    minikube ssh
    sudo poweroff

    iii) Then start minikube the same way.

Here are some suggested best practices

Maintaing git repo:
  • Branching must be used while updating the source code or adding a particular file to the repository. Suppose you want to add a readme, then create a new branch from master, create the readme and commit it and then merge the branch with the origin.
  • Similarly, for adding some test files/changing source code – create a new branch for testing/modification, update and commit the code and merge with the master when finished. The purpose of this is to allow easy roll back to the original master if you run into some errors when working with the new files and to prevent any conflict in code with the master.
  • Commit only after you have tested the code properly, never commit incomplete code.
  • Write good commit messages to keep track of the changes you have made.
Versioning:
  • Follow the versioning convention X.Y.Z where X is incremented for a new major update/feature, Y is incremented for minor updates/minor features and Z for minor patches/bug fixes
  • Avoid version lock that has too many dependencies in a single version. In such a scenario the package can only be updated after releasing new versions for every dependent package.
Docker repo:
  • Use unique tags for deployment/pushing images to the repository.
  • Always use stable tags for building images but never deploy/pull images using stable tags. Stable tags are the ones that do not roll over the updates to future versions but are bound to the current version (tag). 

Welcome to Cloud View!

The last two days of 2019 feel a bit like a waiting period – it’s a tad early to start celebrating, but it’s hard to plan anything until the new year celebrations are truly behind us and we are back at work. We think a good way to use this time would be to indulge in a bit of nostalgia and look back at how the technology landscape evolved in 2019.

Recap 2019

Kubernetes Podcast in 2019

If you deal with Kubernetes, then we are sure you follow the Kubernetes Podcast. Here is the roundup of the year’s best! Enjoy!

CRN’s 2019 Year in Review

LOVE ‘Top-10’ listicles? Here is a mammoth list of technology top 10s by CRN. From top 10 cybersecurity stories to the top 10 mobile apps of 2019 – it’s all here!

Top 10 Smarter with Gartner Articles for 2019

No report, survey, or listicle is complete without Gartner. Here are the 10 best “Smarter with Gartner articles from 2019”. 


Industry insights

What’s New with AWS

A quick video to recap the latest AWS updates and announcements (there are many!) and a web link, too, in case you want to explore categories in more detail.

IBM Z Open Editor Support for LSP

Any programmer can attest to the power – and the usefulness – of Language Server Protocol (LSP). Now its integration with IBM’s Z Open Editor opens a whole new level for coders across the globe.


From CloudIQ

Creating AWS Security Groups for Kubernetes

We are going to discuss creating security groups in AWS for Kubernetes. The goal is to set up a Kubernetes cluster on AWS EC2, having provisioned your virtual machines.

Deploy a Spring-Boot Application in Kubernetes Pod using Jenkins CI/CD Pipeline

Kubernetes has become the preferred platform of choice for container orchestration. Here is a walkthrough of how Jenkins CI/CD pipeline is used to deploy a spring boot application in K8s.

Signing off now and will see you all in the new year. Party hard and bring in 2020 in style.

Kubernetes has become the preferred platform of choice for container orchestration and deliver maximum operational efficiency. To understand how K8s works, one must understand its most basic execution unit – a pod.

Kubernetes doesn’t run containers directly, rather through a higher-level structure called a pod. A pod has an application’s container (or, in some cases, multiple containers), storage resources, a unique network IP, and options that govern how the container(s) should run.

Pods can hold multiple containers or just one. Every container in a pod will share the same resources and network. A pod is used as a replication unit in Kubernetes; hence, it is advisable to not add too many containers in one pod. Future scaling up would lead to unnecessary and expensive duplication.

To maximize the ease and speed of Kubernetes, DevOps teams like to add automation using Jenkins CI/CD pipelines. Not only does this make the entire process of building, testing, and deploying software go faster, but it also minimizes human error. Here is how Jenkins CI/CD pipeline is used to deploy a spring boot application in K8s.

TASK on Hand:

Create a Jenkins pipeline to dockerize a spring application, build docker image, push it to the dockerhub repo and then pull the image into an AKS cluster to run it in a pod.

Complete repository:

All the files required for this task are available in this repository:
https://github.com/saiachyuth5/simple-spring

Pre-Requisites:

A spring-boot application, dockerfile to containerize the application.

STEPS:

1. Install Jenkins :

2. Connect host docker daemon to Jenkins:

  • Run the command: chmod –R Jenkins:docker filename/foldername to allow Jenkins to access docker.
  • Go to manage Jenkins from browser >Configure System and scroll to the bottom
  • Click the dropdown ‘add cloud’ and add Docker. Add the docker host URI in the format tcp://hostip:4243
  • Click verify connection to check your connection. If everything was done right, the docker version is displayed.

3. Adding global credentials:

  • Go to Credentials on the Jenkins dashboard, click global credentials, and then Add credentials.
  • Select the kind as Microsoft Azure Service Principal and enter the required ids, similarly save the docker credentials under type username with password.

4. Create the Jenkinsfile :

  • Refer to the official Jenkins documentation for the pipeline syntax, usage of Jenkinsfile, and simple examples.
  • Below is the Jenkinsfile used for this task.

Jenkinsfile:

NOTE: While this example uses actual id to login to Azure, its recommended to use credentials to avoid using exact parameters.

pipeline {
environment {
registryCredential = "docker"
}
agent any
stages {
stage(‘Build’) {
    steps{
    script {
        sh 'mvn clean install'
    }
    }
}
stage(‘Load’) {
    steps{
    script {
        app = docker.build("cloud007/simple-spring")
    }
    }
}
    stage(‘Deploy’) {
    steps{
    script {
        docker.withRegistry( "https://registry.hub.docker.com", registryCredential ) {
        // dockerImage.push()
        app.push("latest")
        }
    }
    }
}
stage('Deploy to ACS'){
    steps{
        withCredentials([azureServicePrincipal('dbb6d63b-41ab-4e71-b9ed-32b3be06eeb8')]) {
        sh 'echo "logging in" '
        sh 'az login --service-principal -u **************************** -p ********************************* -t **********************************’
        sh 'az account set -s ****************************'
        sh 'az aks get-credentials --resource-group ilink --name    mycluster'
        sh 'kubectl apply -f sample.yaml'
    }
}
}
}
}

5. Create the Jenkins project:

  • Select New view>Pipeline and click ok.
  • Scroll to the bottom and select the definition as Pipeline from SCM.
  • Select the SCM as git and enter the git repo to be used, path to Jenkinsfile in Script path.
  • Click apply, and the Jenkins project has now been created.
  • Go to my views, select your view, and click on build to build your project.

6. Create and connect to Azure Kubernetes cluster:

  • Create an Azure Kubernetes cluster with 1-3 nodes and add its credentials to global credentials in Jenkins.
  • Install azure cli on the Jenkins host machine.
  • Use shell commands in the pipeline to log in, get-credentials, and then create a pod using the required yaml file.

YAML used:

apiVersion: apps/v1
kind: Deployment
metadata:
    name: spring-helloworld
spec:
    replicas: 1
    selector:
    matchLabels:
        app: spring-helloworld
    template:
    metadata:
        labels:
        app: spring-helloworld
    spec:
        containers:
        - name: spring-helloworld
        image: cloud007/simple-spring:latest
        imagePullPolicy: Always
        ports:
        - containerPort: 80

Here are some common problems faced during this process and the troubleshooting procedure.

  • Corrupt Jenkins exec file:
    Solved by doing an apt-purge and then apt-install Jenkins.
  • Using 32-bit VM:
    Kubectl is not supported on a 32-bit machine and hence make sure the system is 64-bit.
  • Installing azure cli manually makes it inaccessible for non-root users
    Manually installing azure cli placed it in the default directories, which were not accessible by non-root users and hence by Jenkins. So, it is recommended to install azure cli using apt.
  • Installing minikube using local cluster instead of AKS:
    Virtual box does not support nested VTx-Vtx virtualization and hence cannot run minikube. It is recommended to enable Hyperv and use HyperV as the driver to run minikube.
  • Naming the stages in the Jenkinsfile:
    Jenkins did not accept when named stage as ‘Build Docker Image’ or multiple words for some reason. Use a single word like ‘Build’, ‘Load’ etc…
  • Jenkins stopped building the project when the system ran out of memory:
    Make sure the host has at least 20 GB free in the hard disk before starting the project.
  • Jenkins couldn’t execute docker commands:
    Try the command usermog –a  –G docker Jenkins
  • Spring app not accessible from external IP:
    Created a new service with type loadbalancer, assigned it to the pod, and the application was accessible from this new external ip.

In an upcoming article we will show you how to deploy a pod containing three applications using Jenkins ci/cd pipeline and update them selectively.

Welcome to Cloud View!

This week we present to you some fresh articles on the evolution of some of the most dominant technologies for the coming decade.

Predictions

Blockchain in 2020

2019 has been a year of great highs and some lows for the blockchain technology. The next year is going to see some significant growth and consolidation of large blockchain protocols and digital assets.

Cybersecurity in 2020

Everyone agrees that security is going to be ALL IMPORTANT in the coming year. Here is an article that positions 2020 as the year of the breach. We guarantee you will bump security to the top of your list after reading this.


Industry insights

New features in Azure Monitor Metrics Explorer

A few months ago, Microsoft’s Azure clients gave some feedback regarding the use of metrics in Azure Portal. Now the Microsoft team comes back with some new features which address the main concerns of the community.


The Update Framework (TUF)

The ninth to join the CNCF’s list of mature technologies – The Update Framework (TUF) is an open-source technology that secures software update systems.


From CloudIQ

Kubernetes Deployment Controller – An Inside Look

Kubernetes Deployment Controller helps monitor and manage the upgrade, downgrade, and scaling of services without any disruption or downtime. Here’s a detailed look at the inner workings of Kubernetes Deployment Controller.

Implementing Azure AD Pod Identity in AKS Cluster

Cloud-based identity and access management service becomes a necessity for connecting pods in AKS cluster to access other Azure cloud resources and services. Here is a detailed look at how Azure AD Pod Identity helps.

Container and container orchestration have become the default system for any DevOps team that wants to scale on-demand, reduce costs, and deliver faster. And to get the best out of container technology, Kubernetes is the way to go. A recommended Kubernetes practice is to manage pods through a Deployment; this way, they can be monitored and restarted if a failure occurs.

A deployment is created by using a Kubernetes Deployment Controller object. The application (in a container) is deployed to Kubernetes by declaratively passing a desired state to the Kubernetes Deployment Controller. A K8s deployment controller object is utilized for monitoring, management of upgrade, downgrade, and scaling of services (e.g., pods) without any disruption or downtime. This is made possible because the deployment controller is the single source of truth for the sizes of new and old replica sets. It maintains multiple replica sets, and when you describe a desired state, the DC changes the actual state at the correct pace.

Here’s a detailed look at the inner workings of Kubernetes Deployment Controller

K8s deployment controller is responsible for the following functions

– Managing a set of pods in the form of Replica Sets & Hash-based labels
– Rolling out new versions of application through new Replica Sets
– Rolling back to old versions of application through old Replica Sets
– Pause & Resume Rollout/Rollback functions
– Scale-Up/Down functions

“The Kubernetes controller manager is a daemon that embeds the core control loops shipped with Kubernetes. In applications of robotics and automation, a control loop is a non-terminating loop that regulates the state of the system. In Kubernetes, a controller is a control loop that watches the shared state of the cluster through the apiserver and makes changes attempting to move the current state towards the desired state. Examples of controllers that ship with Kubernetes today are the replication controller, endpoints controller, namespace controller, and serviceaccounts controller.”

func NewControllerInitializers(loopMode ControllerLoopMode) map[string]InitFunc {
        controllers := map[string]InitFunc{}
        controllers["endpoint"] = startEndpointController
        controllers["endpointslice"] = startEndpointSliceController
        controllers["replicationcontroller"] = startReplicationController
        controllers["podgc"] = startPodGCController
        controllers["resourcequota"] = startResourceQuotaController
        controllers["namespace"] = startNamespaceController
        controllers["serviceaccount"] = startServiceAccountController
        controllers["garbagecollector"] = startGarbageCollectorController
        controllers["daemonset"] = startDaemonSetController
        controllers["job"] = startJobController
        controllers["deployment"] = startDeploymentController
        controllers["replicaset"] = startReplicaSetController
        controllers["horizontalpodautoscaling"] = startHPAController
        controllers["disruption"] = startDisruptionController
        controllers["statefulset"] = startStatefulSetController
        controllers["cronjob"] = startCronJobController
        controllers["csrsigning"] = startCSRSigningController
        controllers["csrapproving"] = startCSRApprovingController
        controllers["csrcleaner"] = startCSRCleanerController
        controllers["ttl"] = startTTLController
        controllers["bootstrapsigner"] = startBootstrapSignerController
        controllers["tokencleaner"] = startTokenCleanerController
        controllers["nodeipam"] = startNodeIpamController
        controllers["nodelifecycle"] = startNodeLifecycleController
 	if loopMode == IncludeCloudLoops {
                controllers["service"] = startServiceController
                controllers["route"] = startRouteController
                controllers["cloud-node-lifecycle"] = startCloudNodeLifecycleController
                // TODO: volume controller into the IncludeCloudLoops only set.
        }
        controllers["persistentvolume-binder"] = startPersistentVolumeBinderController
        controllers["attachdetach"] = startAttachDetachController
        controllers["persistentvolume-expander"] = startVolumeExpandController
        controllers["clusterrole-aggregation"] = startClusterRoleAggregrationController
        controllers["pvc-protection"] = startPVCProtectionController
        controllers["pv-protection"] = startPVProtectionController
        controllers["ttl-after-finished"] = startTTLAfterFinishedController
        controllers["root-ca-cert-publisher"] = startRootCACertPublisher

        return controllers
}

Let’s look at inside workings of “Deployment” Controller. It watches for following object updates.

func startDeploymentController(ctx ControllerContext) (http.Handler, bool, error) {
        if !ctx.AvailableResources[schema.GroupVersionResource{Group: "apps", Version: "v1", Resource: "deployments"}] {
                return nil, false, nil
        }
        dc, err := deployment.NewDeploymentController(
                ctx.InformerFactory.Apps().V1().Deployments(),
                ctx.InformerFactory.Apps().V1().ReplicaSets(),
                ctx.InformerFactory.Core().V1().Pods(),
                ctx.ClientBuilder.ClientOrDie("deployment-controller"),
        )
        if err != nil {
                return nil, true, fmt.Errorf("error creating Deployment controller: %v", err)
        }
        go dc.Run(int(ctx.ComponentConfig.DeploymentController.ConcurrentDeploymentSyncs), ctx.Stop)
        return nil, true, nil
}

The “Deployment Controller” initializes the following Event handlers.

dInformer.Informer().AddEventHandler(cache.ResourceEventHandlerFuncs{
                AddFunc:    dc.addDeployment,
                UpdateFunc: dc.updateDeployment,
                // This will enter the sync loop and no-op, because the deployment has been deleted from the store.
                DeleteFunc: dc.deleteDeployment,
        })
        rsInformer.Informer().AddEventHandler(cache.ResourceEventHandlerFuncs{
                AddFunc:    dc.addReplicaSet,
                UpdateFunc: dc.updateReplicaSet,
                DeleteFunc: dc.deleteReplicaSet,
        })
        podInformer.Informer().AddEventHandler(cache.ResourceEventHandlerFuncs{
                DeleteFunc: dc.deletePod,
        })

Since Kubernetes uses asynchronous programming, the events are processed through work queues and workers.

func (dc *DeploymentController) addDeployment(obj interface{}) {
        d := obj.(*apps.Deployment)
        klog.V(4).Infof("Adding deployment %s", d.Name)
        dc.enqueueDeployment(d)
}

The items from the queue are handled by “syncDeployment” handler. Some of the functions done by the handler are shown below.

// List ReplicaSets owned by this Deployment, while reconciling ControllerRef
        // through adoption/orphaning.
        rsList, err := dc.getReplicaSetsForDeployment(d)
	
	// List all Pods owned by this Deployment, grouped by their ReplicaSet.
        // Current uses of the podMap are:
        //
        // * check if a Pod is labeled correctly with the pod-template-hash label.
        // * check that no old Pods are running in the middle of Recreate Deployments.
        podMap, err := dc.getPodMapForDeployment(d, rsList)

	// Update deployment conditions with an Unknown condition when pausing/resuming
        // a deployment. In this way, we can be sure that we won't timeout when a user
        // resumes a Deployment with a set progressDeadlineSeconds.
        if err = dc.checkPausedConditions(d); err != nil {
                return err
        }

	// rollback is not re-entrant in case the underlying replica sets are updated with a new
        // revision so we should ensure that we won't proceed to update replica sets until we
        // make sure that the deployment has cleaned up its rollback spec in subsequent enqueues.
        if getRollbackTo(d) != nil {
                return dc.rollback(d, rsList)
        }

        scalingEvent, err := dc.isScalingEvent(d, rsList)
        if err != nil {
                return err
        }
        if scalingEvent {
                return dc.sync(d, rsList)
        }

        switch d.Spec.Strategy.Type {
        case apps.RecreateDeploymentStrategyType:
                return dc.rolloutRecreate(d, rsList, podMap)
        case apps.RollingUpdateDeploymentStrategyType:
                return dc.rolloutRolling(d, rsList)
        }

Sync is responsible for reconciling deployments on scaling events or when they are paused.

func (dc *DeploymentController) sync(d *apps.Deployment, rsList []*apps.ReplicaSet) error {
        newRS, oldRSs, err := dc.getAllReplicaSetsAndSyncRevision(d, rsList, false)
        if err != nil {
                return err
        }
        if err := dc.scale(d, newRS, oldRSs); err != nil {
                // If we get an error while trying to scale, the deployment will be requeued
                // so we can abort this resync
                return err
        }

        // Clean up the deployment when it's paused and no rollback is in flight.
        if d.Spec.Paused && getRollbackTo(d) == nil {
                if err := dc.cleanupDeployment(oldRSs, d); err != nil {
                        return err
                }
        }

        allRSs := append(oldRSs, newRS)
        return dc.syncDeploymentStatus(allRSs, newRS, d)
}


// scale scales proportionally in order to mitigate risk. Otherwise, scaling up can increase the size
// of the new replica set and scaling down can decrease the sizes of the old ones, both of which would
// have the effect of hastening the rollout progress, which could produce a higher proportion of unavailable
// replicas in the event of a problem with the rolled out a template. Should run only on scaling events or
// when a deployment is paused and not during the normal rollout process.

func (dc *DeploymentController) scale(deployment *apps.Deployment, newRS 
*apps.ReplicaSet, oldRSs []*apps.ReplicaSet) error {

 	// If there is only one active replica set then we should scale that up to the full count of the
        // deployment. If there is no active replica set, then we should scale up the newest replica set.
        if activeOrLatest := deploymentutil.FindActiveOrLatest(newRS, oldRSs); activeOrLatest != nil {


	// If the new replica set is saturated, old replica sets should be fully scaled down.
        // This case handles replica set adoption during a saturated new replica set.
        if deploymentutil.IsSaturated(deployment, newRS) {

 // There are old replica sets with pods, and the new replica set is not saturated. 
        // We need to proportionally scale all replica sets (new and old) in case of a
        // rolling deployment.
        if deploymentutil.IsRollingUpdate(deployment) {

		// Number of additional replicas that can be either added or removed from the total
                // replicas count. These replicas should be distributed proportionally to the active
                // replica sets.
                deploymentReplicasToAdd := allowedSize - allRSsReplicas

                // The additional replicas should be distributed proportionally amongst the active
                // replica sets from the larger to the smaller in size replica set. Scaling direction
                // drives what happens in case we are trying to scale replica sets of the same size.
                // In such a case when scaling up, we should scale up newer replica sets first, and
                // when scaling down, we should scale down older replica sets first.

We hope this article helped you understand the inner workings of Kubernetes deployment controller. If you would like to learn more about Kubernetes and get certified, join our 2-day Kubernetes workshop.

Welcome to Cloud View!

The last couple of weeks we have been curating predictions for the coming year (and decade) from well regarded sources. Now it’s time to drill down deeper into specific areas and find out what experts in the field see in store for the future.

Predictions

Cybersecurity: Mitigating cyber-attacks and risks

Forbes has put out a really exhaustive list of predictions (141 to be exact!) in the cyber security realm. These are all from key players and professionals in the digital arena – CIOs, CEOs, CFOs and security heads from across the digital spectrum weigh in with what they think is crucial to mitigate cyber attacks and risks in the coming few years.

Future of DevOps

DevOps is all about bringing the power of collaboration to executing business ideas; turning organizational visions into applications that drive growth and profits. So what does the future hold for the DevOps community?


Industry Speak

AT&T integrating 5G with Microsoft cloud

5G has been in news for all the wrong reasons, but finally we see some interesting news emerging from the industry. A strategic partnership between Microsoft and AT&T announces that AT&T’s 5G core will run on Azure!


Kubernetes for exponential growth

Containers and container orchestration with Kubernetes are vital for any tech-based business looking to deliver more features – faster and more affordably. Here is a look at how AlphaSense, one of the top AI start-ups leveraged Kubernetes to accelerate growth.


From CloudIQ

Optimizing Azure Cosmos DB Performance

Azure Cosmos DB allows Azure platform users to elastically and independently scale throughput and storage across any number of Azure regions worldwide. Here is an article on how to optimize Cosmos DB performance.

How to Debug and Troubleshoot Common Problems in Kubernetes Deployments

Kubernetes deployment issues are not always easy to troubleshoot. In some cases, the errors can be resolved easily, and, in some cases, detecting errors requires us to dig deeper and run various commands to identify and resolve the issues. Here is a guided tutorial to debug applications that are deployed into Kubernetes.

Welcome to Cloud View!

With the new year 2020 coming closer, the industry is firmly looking towards the future. Cloud news is full of predictions for the year ahead and here’s a quick selection we picked for this week’s reading.

Predictions

Forrester’s cloud computing predictions for 2020

Forrester has an excellent track record of predicting the right cloud trends. And that makes their 2020 cloud computing predictions a MUST-READ. Here is a breakdown from TechRepublic.

Gartner’s top strategic predictions for 2020 and beyond

Last week we put the spotlight on Granter’s strategic trends, this week we delve further into these to understand how they would affect the people and their lives and work. Unsurprisingly, AI takes center-stage again.


Industry Insights

Telcos embrace containers

Gartner predicts that over 75% of global companies will run containerized applications by 2022. Kubernetes is the leading container orchestration platform for managing these containers. Here is a look at how Telcos are planning to use it to deploy cloud-native 5G networks.


AWS IoT Day – Eight Powerful New Features

AWS regularly puts out bundled themed announcements, which make it easy for us to find relevant information in one place. Here is the one related to AWS IoT Day. Check out 8 powerful AWS features, from secret tunneling to Alexa voice service integration and more.


Interesting announcements from KubeCon

Over 100 announcements were made at KubeCon, here’s a quick read of the 10 most important ones.


This week at CloudIQ

Kubernetes on Azure: A 2-day workshop for AKS developers

Container technology has revolutionized the DevOps landscape and offers organizations the chance to develop and test applications faster and more cost-effectively. CloudIQ’s 2-day hands-on workshop is designed to give DevOps team members the opportunity to skill-up and learn Kubernetes design, deployment, and management.

Configuring Palo Alto Networks Next-Generation Firewall (NGFW) – A Detailed Guide

Today organizations require an enterprise cyber-security platform, which provides network security, cloud security, endpoint protection, & various related cloud-delivered security services. Palo Alto Networks Next-Generation Firewall (NGFW) fits the bill and here is a detailed guide on configuring it.

End-to-end front-end testing has always been a bit of a pain for developers. Testing is one of the critical final steps of any development project, however web testing has tested the patience of all developers at some time or another. The modern web testing ecosystem comes with its own set of challenges – from data security to additional time and expense to managing the dynamic behavior of the contemporary development frameworks. Hence, the need to bring automation to the testing process!

Benefits of Automation Testing
  • Automation increases the speed of test execution
  • Automation helps increase Test Coverage
  • One can do automation testing at the time of regression work
  • Automation testing works when GUI is the same, but you will have a lot of functional changes

When to use Automation Testing?

Here are some scenarios where Automation testing is highly recommended

  • Requirements do not change frequently
  • Access to the application for load and performance with many virtual users
  • Steady software with respect to manual testing
  • Obtainability of time
  • Huge and business-critical projects
  • Projects that need to test the same areas often
Automation testing step by step

There are lots of helpful tools to write automation scripts, however, before using those tools it’s important to identify the process for test automation.

  • Identify areas within the software to automate
  • Choose the appropriate tool for test automation
  • Write test scripts
  • Develop test suits
  • Execute test scripts
  • Build result reports
  • Find possible bugs or performance issue
List of automation tools:

Automation of testing frameworks helps us to improve the quality, speed, and accuracy of the testing processes. Here is a list of automation tools,

  • Cypress
  • Selenium
  • Protractor
  • Appium(Mobile)
Why choose Cypress:

Cypress solves many of the main testing bottlenecks developers face regularly. It is a JavaScript-based end-to-end testing framework that doesn’t use Selenium (most widely used for testing) at all. It is built on top of Mocha and Mocha’s features is the javascript test framework running on the browser, which makes asynchronous testing simple. Cypress automatically waits for loading DOM element, elements to be visible, AJAX calls to be finished, etc. Hence, we don’t need to use implicit and explicit waits.

Another advantage Cypress offers to developers is that it runs directly in the browser with no network communication. The architecture makes testing and development happen simultaneously. It allows developers access to tools, and they can make changes and see them reflected in real-time. Naturally, this lends more precision and speed to the whole process.

Features of Cypress:
  • Time travel: Cypress takes snapshots as your test runs.
  • Debuggability: Cypress can guess why a test case has failed.
  • Automatic waiting: There is no need to use wait or sleep because it automatically waits for your commands.
  • Spies, stubs, and clocks: verify and control the behavior of functions, server response, and timer.
  • Screenshot and video: Cypress testing automatically takes screenshots when your test case fails and makes a video of the complete result when it is run from the CLI.
Features of Mocha:

Mocha provides the below benefits,

  • Browser support
  • Async & promises support
  • Test coverage reporting
Advantage of Cypress:
  • Open-source
  • It has Promise Support
  • Java script testing framework
  • Easy and reliable testing
  • Fast, free and open-source
  • Easy to control our response, headers, and status.
  • Helps you in finding the locator
Installing Cypress:

Installing Cypress is an easy task compared to a Selenium installation. There are two commands used to install Cypress on machines. These are,

  1. npm init
  2. npm install Cypress

The first command is used to create a “package.json” file, and the second command is used to install all Cypress dependencies.

Project Folder Structure Details:

Project folder structure details as below,

*node_modules folder – It is the directory for build tools.

*package.json file – It is the file in the app root, which defines where libraries will be installed into node_modules when you run “npm install”.

*cypress folder – It contains folder like fixtures, integration, plugins, screenshot, support, and video. These folder features are below,

a. Fixture – This folder is used as external pieces of static data and can be used for your test.

b. Integration – This folder is used to write the testcase of your app.

 c. Screenshot – This folder is used to store screenshots of your test.

 d. Video – It is used to store videos in your test.

 e. Support – This folder is used to write the common commands file.

Write your sample program in Cypress:

Step 1: Open your visual studio code in your machine

Step 2: Create a new Cypress project folder and name it as “cypresse2e”

Step 3: Open the command line and go to the above-created project path.

Step 4: Type first command under “Installing cypress” heading then wait for it to create package.json file.

Step 5: After that, type the second command

Step 6: The above task will finish within 2-3 minutes after creating the Cypress and node_modules folder inside the “cypresse2e” folder. This folder will also contain a “json” file.

Step 7: Click the Cypress folder under “cypresse2e” in vs code.

Step 8: Automation page details are as below,

We will use the CloudIQ home page link for this automation

Step 9: Create the “cypressAutomation.spec.ts” file under integration folder and write the program as seen below in the screenshot,

Program Explanation:

Here is what the test script given above does.

  • Navigate to “cloudiqtech” site.
  • Wait for 10 seconds for the page to load
  • Next click on “AWS” ref link
  • Then navigate to “AWS” page
  • Finally, validate the current page as the AWS page.

Step 10: Open the command terminal then go to your Cypress project path then run the below-mentioned command,

Step 11: After waiting for 1-2 minutes, it will open the Cypress Terminal app, as shown below in the screenshot. It contains all the tests – like the ones you wrote in your automation test and default tests.

Step 12: Click your “cypressAutomation.spec.js” file, this automatically opens the default chrome browser to run your test and makes a test coverage report in your browser like below,

Test Result:

*Three tests are passed successfully.

*No tests failed here.

*In total, these three tests ran within 30.44 seconds.

*Screenshots were automatically taken during your tests. If you hover above the testcase in the test, it will display the screenshot image for every separate testcase.

Welcome to Cloud View!

The cloud computing landscape is evolving, innovating, and expanding almost at the speed of thought! Every week there are hundreds of announcements, insights, opinions, and studies discussing new trends, ideas, and technological breakthroughs. To help you get the most relevant information, we have put together a weekly curated list of must-read articles under Cloud View.

Predictions

Cloud computing in 2020

As 2019 winds down, our eyes turn to the next year. Let’s find out what the pundits are predicting for cloud computing in 2020.
Hybrid cloud is passé as onmi-cloud becomes the preferred enterprise approach; Kubernetes is all set to become the dominant talking point in tech conversations in 2020, and AI will become omnipresent.

How will upcoming technology affect humans – at work or at home?

How can IT leaders invest in today’s technology for a future payoff? Here are the top 10 strategic tech trends by Gartner – a look into the future. A must-read before planning for the coming year.

AI and Robotics

No talk of the future is complete without AI and robotics! Keith Shaw, editor-in-chief of Robotics Business Review, shares some insights on where robotics is heading.

Industry Insights

Microsoft and Google Cloud’s battle for the enterprise

Google Cloud is aggressively trying to elbow into the cloud market, leading to a battle royale. All the cloud giants have deep pockets and are not afraid of investing BIG, which is great for innovation and enterprise clients. Check out the whole article for a more in-depth look at how the race for cloud dominance is playing out.  

Confidential Computing

Microsoft brings confidential computing capability to Kubernetes workloads – an additional layer of security to keep business data safe.

This week at CloudIQ

 

CloudIQ Technologies is now a Kubernetes Certified Service Provider (KCSP)

Cloud Native Computing Foundation (CNCF) recognizes CloudIQ as one of the few (122 as on Nov 19, 2019) service providers worldwide to receive KCSP certification. As a KCSP, CloudIQ will be able to access collaborative group support to help its clients develop and deploy cloud native applications quickly and efficiently. The CNCF partnership program also puts CloudIQ in touch with organizations looking to design, adopt and implement cloud native solutions.

 

Understanding Kubernetes Concepts – A QuickStart Guide 

Container technology made software development more agile, however, containers need to be tracked, monitored, and managed, which is where container orchestration and Kubernetes come in. Here is a quick start guide to understanding Kubernetes concepts.

At a Glance

Seattle, WA, November 21, 2019 – CloudIQ Technologies, a fast-growing premier cloud consulting and solutions provider, announced today its status as a Kubernetes Certified Service Provider (KCSP).

Cloud Native Computing Foundation(CNCF) is a non-profit member of the Linux Foundation that promotes cloud-native computing and the certification body for Kubernetes. CNCF recognizes CloudIQ as one of the few (121 as on date) service providers worldwide to receive KCSP certification.

“We are so thrilled to welcome CloudIQ Technologies into the CNCF family,” said Dan Kohn, Executive Director of the Cloud Native Computing Foundation. “As part of our select group of Kubernetes Certified Services Providers (KCSPs), CloudIQ will be an integral part of our Kubernetes platform outreach and will be available to help organizations successfully adopt Kubernetes.”

As a KCSP, CloudIQ will be able to access collaborative group support to help its clients develop and deploy cloud native applications quickly and efficiently. The CNCF partnership program also puts CloudIQ in touch with organizations looking to design, adopt and implement cloud native solutions.

CloudIQ’s technology experts, who have more than a decade of varied technical and business experience, deliver the full range of cloud-native / open source solutions to its clients through its team of Azure & AWS certified architects, Certified Kubernetes Administrators (CKA), Certified Kubernetes Developers, and DevSecOps professionals.

“We have been extremely happy to work closely with the CNCF community to push the boundaries of the Kubernetes platform further and pass on the benefits of cloud-native technologies to our clients,” said Mr. Prem Kandalu, CEO. “As a part of the Cloud Native Computing Foundation, we will continue to develop new capabilities and strengthen our cloud strategy and business. We hope our contribution to the pooled expertise at CNCF delivers greater speed, scale, economic advantages to developers across the board and leads to the growth of the entire K8s community.”

About CloudIQ Tech:

CloudIQ is a leading cloud consulting and solutions firm with deep industry expertise in building cloud-native solutions that help customers realize the cost, scale, and security benefits of the cloud. As strategic advisors to several Fortune 500 companies, CloudIQ provides operational strategies, monitoring & assistance with command centers, and application transformation guidance for comprehensive cloud migrations.

For more information visit our site https://www.cloudiqtech.com
Or you are welcome to contact us by calling us at +1 (206) 203-4151
Or mailing at [email protected]

Cosmos DB is Microsoft Azure’s hugely successful tool to help their clients manage data on a global scale. This multi-model database service allows Azure platform users to elastically and independently scale throughput and storage across any number of Azure regions worldwide.

As Cosmos DB supports multiple data models, you can take advantage of fast, single-digit-millisecond data access using any of your favorite APIs, including SQL, MongoDB, Cassandra, Tables, or Gremlin. Being a NoSQL database, anyone with experience in MongoDB can easily work with Cosmos DB. Meanwhile, by supporting SQL APIs, it makes it easy to interact with SQL knowledge.

Why Cosmos DB?

For organizations looking to build a flexible and scalable database that is globally distributed, Cosmos DB is especially useful as it

  • provides a ready-to-use, extremely dynamic database service
  • guarantees low latency of less than 10 milliseconds when reading data and less than 15 milliseconds when writing data.
  • offers customers a faster, completely seamless experience.
  • offers 99.99% availability

Here are some tried and tested tips from our senior Azure expert on how to get the most out of Cosmos DB.

Data Modeling

Cosmos DB is great because it lets you model semi/unstructured aggregates and dynamic entities. This means it’s very easy to model ever-changing entities, entities that don’t all share the same attributes and hierarchical aggregates. To model for Cosmos, you need to think in terms of hierarchy and aggregates instead of entities and relations. NoSQL lets you, say, store a thing that has other things, which have things of their own. Give me the whole hierarchy of things back. So, you don’t have a person, rental addresses, and a relation between them. Instead, you have rental records, which aggregate for each person what rental addresses they’ve had.

The following NO SQL rules will perfectly match cosmos DB too.

  • Should be used as a complement to an existing or additional database.
  • Deal with PACELC theorem, an extension of CAP theorem.
  • A data modeler should think in terms of queries instead of in terms of storage
Connection Types

Cosmos DB can be connected to the application by 2 modes.

  • Gateway mode
  • Direct mode

The gateway mode is the default mode in Microsoft.Azure.DocumentDB SDK. It uses HTTPS port with a single endpoint. While the direct mode is the default for .NET V3 SDK. This uses TCP and HTTPS for connectivity.

The gateway mode is better while your application runs within a corporate network with strict network rules because it has a single endpoint that can be configured to the firewall for security. Meanwhile, the gateway mode performance will be low when compared to the direct mode.

There is also an option to connect through a RESTful programming model provided by the SDK. All the CRUD can be done through REST calls. This method is recommended if you need a client App to do database access directly instead of providing API. Thus, the overhead of providing an API wrapper to consume the Cosmos DB can be eradicated, and a performance payoff can be prevented.

The recommended mode is always the direct mode in most of the scenarios, which provides better performance.

I am taking the popular volcano data for comparing the response time between the SDK and RESTful model.

Query Executed in both the versions.

SELECT * FROM c where c.Status="Holocene"
Response details of the SDK

The query was responded with the data in 3710 ms.

Response details of the RESTful model

The query was responded with data in 5810 ms.

If we developed an API with this mode, then the response time taken by our API, too needs to be considered. So, using RESTful model in API will be a trade-off with the performance. Use this mode to query from the client directly.

Partitioning the DB

The logical partition is the primary key to provide performance to the cosmos transactions. For example, if you have a database with a list of student data of around 1500 from a school. Now, a simple search for a student named “Peter” will lead to a search through all the 1500 data entries, which consumes high throughput to get the result. Now split the data logically by the “Grade” they belong to. Now, querying like a student named “Peter” from “Grade 5” will lead the system to search only 30 or 40 students from the total 1500 saving the throughput consumption and elevating the performance as compared to the earlier approach.

The common phenomenon will be a. Any key such as city, state, country kind of properties can be used as Partition Key.

a. Any key such as city, state, country kind of properties can be used as Partition Key.
 b. No partition is required up to 10 GB.
 c. The query must be provided with the partition key to be searched.
 d. The property selected as partition key must be in all the documents in the container.

I am taking the popular Volcano data for testing the performance with and without Partition Key.

1. I initially created a collection without a partition key. The performance for the given query is

SELECT * FROM c where c.Status="Holocene"
Resultset
MetricValue
Partition key range id0
Retrieved document count200
Retrieved document size (in bytes)100769
Output document count200
Output document size (in bytes)101069
Index hit document count200
Index lookup time (ms)0.21
Document load time (ms)1.29
Query engine execution time (ms)0.33
System function execution time (ms)0
User defined function execution time (ms)0
Document write time (ms)0.52

2. Then I recreated the same collection with “/country” as a partition. Now the same query with Japan as partition value results with the given values.

Resultset
MetricValue
Partition key range id0
Retrieved document count16
Retrieved document size (in bytes)7887
Output document count16
Output document size (in bytes)7952
Index hit document count16
Index lookup time (ms)0.23
Document load time (ms)0.17
Query engine execution time (ms)0.06
System function execution time (ms)0
User defined function execution time (ms)0
Document write time (ms)0.01
Tune the Index

Indexing is always a top priority item in the checklist to tune performance. Indexing is an internal job that keeps track of the metadata about the data, which helps in finding the result set data for a query. By default, all the properties of a Cosmos container will be indexed. But it is not necessary as it is a useless overhead to the DB, and also keeping track of a lot of data consumes enough RUs, which is not cost-effective. The better approach is to exclude all the paths from indexing and add only age paths that are used for querying in the application. 

Indexing ModeWith Default IndexingWith Custom Indexing
RU’s Consumed3146.1519.83
Output Doc Count100100
Doc load time (In ms)646.512.32
Query engine execution time (In ms)434.264.96
System function execution time (In ms)57.032.41
Paging

The execution of the query, by default, will return 100 documents. We can increase the number of documents by providing “maxItemCount” value. The maximum size will be 1000 documents. But it is not meaningful to take 1000 documents at a time from the DB except in some scenarios. To improve the performance and to show a crisp result set to the user, always reduce the “maxItemCount”. Unlike SQL databases, pagination is the default behavior in Cosmos. So, even though you are going to provide maximum count as 1000 and the result for the query is going to be more than that, then the Cosmos is going to return a token named “Continuation Token”. This token consists of a unique value that points to the query we did and the page number. If the total result for the query is provided by the Cosmos, then we can do the logic of pagination at the front end. Thus, by reducing the number of documents per response, we can save the throughput consumption, network data transaction, and increase performance.

Throughput Management

RU ’s or Request Units is the common term we always come across when using Cosmos DB. When you read a document from a container or write a document to it, then you are trading an RU with the Cosmos for your operation. It is like currency in our common world. Without money, you can’t buy anything, and without RU, you can’t query anything. You can buy only the items that cost equivalent to or less than the money you have in hand. Similarly, you can query the data that equals the RU’s you have.

If you have large data and if the query needs to traverse deep into the collection, then you need enough RU’s. Adding a property in the index will consume some RU’s. So, if all the properties are indexed, then your RU’s pay off will be high, and you will lack RU’s for querying. So, always add in the index only the properties that are needed while querying.

Index properly, save the RU’s, and utilize it during querying. For example, if you have 100K documents in the container. The Cosmos DB is consuming 1000 RU’s to do a query operation up to 50K documents with the default indexing, then our query will not reach the rest of the 50K documents, and we will never receive them in our result set. If appropriately indexed, the same query will consume only 400 RU’s to penetrate all the 100K documents.

Startup latency

The very first query will always be a bit late because of the time it takes to awaken the connection. To overcome this latency, it is best practice to call “OpenAsync()” in SDK 2 in the beginning while creating the connection.

await client.OpenAsync();
Singleton Connection

The best approach is to connect the DB and keep the connection alive for all the instances of the application. Also, polling the DB within a period will keep the connection alive. This reduces the DB connectivity latency.

Regions

Make sure the Cosmos and the applications are grouped within the same Azure Region. This reduces the latency a lot. The lowest possible latency is achieved by ensuring the calling application is located within the same Azure region as the provisioned Azure Cosmos DB endpoint.

Programming Best Practices
  • Always use the latest SDK version.
  • Use Streaming API (in SDK 3) that can receive and return data without serializing. Helpful when your API is just a relay and not doing any logical operations on the data.
  • Tune the queries.
  • Implement retry logic with reasonable waiting time to prevent throttle during a busy time.

By carefully analyzing all the above factors, we can improve the Cosmos DB query performance substantially.

Amazon Web Services (AWS) is a secure cloud services platform, offering compute power, database storage, content delivery, and other functionalities to help businesses scale and grow.

It gives organizations a secure and robust platform to develop their custom cloud-based solutions and has several unique features that make it one of the most reliable and flexible cloud platform such as

  • Mobile-friendly access through AWS Mobile Hub and AWS Mobile SDK
  • Fully managed purpose-built Databases
  • Serverless cloud functions
  • Range of storage options that are affordable and scalable.
  • Unbeatable security and compliance

Following are some core services offered by AWS:

AWS Core services
  1. An EC2 instance is a virtual server in Amazon’s Elastic Compute Cloud (EC2) for running applications on the AWS infrastructure.
  2. Amazon Elastic Block Store (EBS) is a cloud-based block storage system provided by AWS that is best used for storing persistent data.
  3. Amazon Virtual Private Cloud (Amazon VPC) enables us to launch AWS resources into a virtual network that we have defined. This virtual network closely resembles a traditional network that we would operate in our own data center, with the benefits of using the scalable infrastructure of AWS.
  4. Amazon S3 or Amazon Simple Storage Service is a service offered by Amazon Web Services that provides object storage through a web service interface. Amazon S3 uses the same scalable storage infrastructure that Amazon.com uses to run its global e-commerce network.
  5. AWS security groups (SGs) are associated with EC2 instances and provide security at the protocol and port access level. Each security group — working much the same way as a firewall — contains a set of rules that filter traffic coming into and out of an EC2 instance.

Let us look more deeply at one of AWS’s core services – AWS CloudFormation – that is key for managing workloads on AWS.

1.   CloudFormation

AWS CloudFormation is a service that helps us model and set up our Amazon Web Services resources so that we can spend less time managing those resources and more time focusing on our applications that run in AWS.  We create a template that describes all the AWS resources that we want (like Amazon EC2 instances or S3 buckets), and AWS CloudFormation takes care of provisioning and configuring those resources for us. We don’t need to individually create and configure AWS resources and figure out what’s dependent on what; AWS CloudFormation handles all of that.

A stack is a collection of AWS resources that you can manage as a single unit. In other words, we can create, update, or delete a collection of resources by creating, updating, or deleting stacks. All the resources in a stack are defined by the stack’s AWS CloudFormation template.

2.   CloudFormation template

CloudFormation templates can be written in either JSON or YAML.  The structure of the template in YAML is given below:

---
AWSTemplateFormatVersion: "version date"

Description:
  String
Metadata:
  template metadata
Parameters:
  set of parameters
Mappings:
  set of mappings
Conditions:
  set of conditions
Resources:
  set of resources
Outputs:
  set of outputs

In the above yaml file,

  1. AWSTemplateFormatVersion – The AWS CloudFormation template version that the template conforms to.
  2. Description – A text string that describes the template.
  3. Metadata – Objects that provide additional information about the template.
  4. Parameters – Values to pass to our template at runtime (when we create or update a stack). We can refer to parameters from the Resources and Outputs sections of the template.
  5. Mappings – A mapping of keys and associated values that we can use to specify conditional parameter values, like a lookup table. We can match a key to a corresponding value by using the Fn::FindInMap intrinsic function in the Resources and Outputs sections.
  6. Conditions – Conditions that control whether certain resources are created or whether certain resource properties are assigned a value during stack creation or update. For example, we can conditionally create a resource that depends on whether the stack is for a production or test environment.
  7. Resources – Specifies the stack resources and their properties, such as an Amazon Elastic Compute Cloud instance or an Amazon Simple Storage Service bucket.  We can refer to resources in the Resources and Outputs sections of the template.
  8. Outputs – Describes the values that are returned whenever we view our stack’s properties. For example, we can declare an output for an S3 bucket name and then call the AWS cloudformation describe-stacks AWS CLI command to view the name.

Resources is the only required section in the CloudFormation template.  All other sections are optional.

3.   CloudFormation template to create S3 bucket

S3template.yml

Resources:
  HelloBucket:
    Type: AWS::S3::Bucket

In AWS Console, go to CloudFormation and click on Create Stack

Upload the template file which we created.  This will get stored in an S3 location, as shown below.

Click next and give a stack name

Click Next and then “Create stack”.  After a few minutes, you can see that the stack creation is completed.

Clicking on the Resource tab, you can see that the S3 bucket has been created with name “s3-stack-hellobucket-buhpx7oucrgn”.  AWS has provided this same since we didn’t specify the BucketName property in YAML.

Note that deleting the stack will delete the S3 bucket which it had created.

4.   Intrinsic functions

AWS CloudFormation provides several built-in functions that help you manage your stacks.

In the below example, we create two resources – a Security Group and an EC2 Instance, which uses this Security Group.  We can refer to the Security Group resource using the !Ref function.

Ec2template.yml

Resources:
  Ec2Instance:
    Type: 'AWS::EC2::Instance'
    Properties:
      SecurityGroups:
        - !Ref InstanceSecurityGroup
      KeyName: mykey
      ImageId: ''
  InstanceSecurityGroup:
    Type: 'AWS::EC2::SecurityGroup'
    Properties:
      GroupDescription: Enable SSH access via port 22
      SecurityGroupIngress:
        - IpProtocol: tcp
          FromPort: '22'
          ToPort: '22'
          CidrIp: 0.0.0.0/0

Some other commonly used intrinsic functions are

  1. Fn::GetAtt – returns the value of an attribute from a resource in the template.
  2. Fn::Join – appends a set of values into a single value, separated by the specified delimiter. If a delimiter is an empty string, the set of values are concatenated with no delimiter.
  3. Fn::Sub – substitutes variables in an input string with values that you specify. In our templates, we can use this function to construct commands or outputs that include values that aren’t available until we create or update a stack.
5.   Parameters

Parameters enable us to input custom values to your template each time you create or update a stack.

TemplateWithParameters.yaml

Parameters: 
  InstanceTypeParameter: 
    Type: String
    Default: t2.micro
    AllowedValues: 
      - t2.micro
      - m1.small
      - m1.large
    Description: Enter t2.micro, m1.small, or m1.large. Default is t2.micro.
Resources:
  Ec2Instance:
    Type: AWS::EC2::Instance
    Properties:
      InstanceType:
        Ref: InstanceTypeParameter
      ImageId: ami-0ff8a91507f77f867
6.   Pseudo Parameters

Pseudo parameters are parameters that are predefined by AWS CloudFormation. We do not declare them in our template. Use them the same way as we would a parameter as the argument for the Ref function.

Commonly used pseudo parameters:

  1. AWS: Region – Returns a string representing the AWS Region in which the encompassing resource is being created, such as us-west-2
  2. AWS::StackName – Returns the name of the stack as specified during cloudformation create-stack, such as teststack
7.   Mappings

The optional Mappings section matches a key to a corresponding set of named values. For example, if you want to set values based on a region, we can create a mapping that uses the region name as a key and contains the values we want to specify for each specific region. We use the Fn::FindInMap intrinsic function to retrieve values in a map.

We cannot include parameters, pseudo parameters, or intrinsic functions in the Mappings section.

TemplateWithMappings.yaml

AWSTemplateFormatVersion: "2010-09-09"
Mappings: 
  RegionMap: 
    us-east-1:
      HVM64: ami-0ff8a91507f77f867
      HVMG2: ami-0a584ac55a7631c0c
    us-west-1:
      HVM64: ami-0bdb828fd58c52235
      HVMG2: ami-066ee5fd4a9ef77f1
    eu-west-1:
      HVM64: ami-047bb4163c506cd98
      HVMG2: ami-0a7c483d527806435
    ap-northeast-1:
      HVM64: ami-06cd52961ce9f0d85
HVMG2: ami-053cdd503598e4a9d
    ap-southeast-1:
      HVM64: ami-08569b978cc4dfa10
      HVMG2: ami-0be9df32ae9f92309
Resources: 
  myEC2Instance: 
    Type: "AWS::EC2::Instance"
    Properties: 
      ImageId: !FindInMap [RegionMap, !Ref "AWS::Region", HVM64]
      InstanceType: m1.small
8.   Outputs

The optional Outputs section declares output values that we can import into other stacks (to create cross-stack references), return in response (to describe stack calls), or view on the AWS CloudFormation console. For example, we can output the S3 bucket name for a stack to make the bucket easier to find.

In the below example, the output named StackVPC returns the ID of a VPC, and then exports the value for cross-stack referencing with the name VPCID appended to the stack’s name.

Outputs:
  StackVPC:
    Description: The ID of the VPC
    Value: !Ref MyVPC
    Export:
      Name: !Sub "${AWS::StackName}-VPCID"

As organizations start to create and maintain clusters in AKS (Azure Kubernetes Service), they also need to use cloud-based identity and access management service to access other Azure cloud resources and services. The Azure Active Directory (AAD) pod identity is a service that gives users this control by assigning identities to individual pods.  

Without these controls, accounts may get access to resources and services they don’t require. And it can also become hard for IT teams to track which set of credentials were used to make changes.

Azure AD Pod identity is just one small part of the container and Kubernetes management process and as you delve deeper, you will realize the true power that Kubernetes and Containers bring to your DevOps ecosystem.

Here is a more detailed look at how to use AAD pod identity for connecting pods in AKS cluster with Azure Key Vault.

Pod Identity

Integrate your key management system with Kubernetes using pod identity. Secrets, certificates, and keys in a key management system become a volume accessible to pods. The volume is mounted into the pod, and its data is available directly in the container file system for your application.

On an existing AKS cluster –

Deploy Key Vault FlexVolume to your AKS cluster with this command:

  • kubectl create -f https://raw.githubusercontent.com/Azure/kubernetes-keyvault-flexvol/master/deployment/kv-flexvol-installer.yaml
1. Create the Deployment

Run this command to create the aad-pod-identity deployment on an RBAC-enabled cluster:

  • kubectl apply -f https://raw.githubusercontent.com/Azure/aad-pod-identity/master/deploy/infra/deployment-rbac.yaml

Or run this command to deploy to a non-RBAC cluster:

  • kubectl apply -f https://raw.githubusercontent.com/Azure/aad-pod-identity/master/deploy/infra/deployment.yaml
2. Create an Azure Identity

Create azure managed identity

Command:- az identity create -g ResourceGroupNameOfAKsService -n aks-pod-identity(ManagedIdentity)

Output:-  

{
"clientId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx ",
"clientSecretUrl": "https://control-westus.identity.azure.net/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourcegroups/aks_dev_rg_wu/providers/Microsoft.ManagedIdentity/userAssignedIdentities/aks-pod-identity/credentials?tid=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx&oid=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxx&aid=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx ",
"id": "/subscriptions/xxxxxxxx-xxxx-XXXX-XXXX-XXXXXXXXXXXX/resourcegroups/aks_dev_rg_wu/providers/Microsoft.ManagedIdentity/userAssignedIdentities/aks-pod-identity",
"location": "westus",
"name": "aks-pod-identity",
"principalId": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
"resourceGroup": "au10515_aks_dev_rg_wu",
"tags": {},
"tenantId": XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXX ",
"type": "Microsoft.ManagedIdentity/userAssignedIdentities"
}

Assign Cluster SPN Role

Command for Getting AKSServicePrincipalID:- az aks show -g <resourcegroup> -n <name> –query servicePrincipalProfile.clientId -o tsv

Command:-az role assignment create –role “Managed Identity Operator” –assignee <AKSServicePrincipalId> –scope < ID of Managed identity>

Assign Azure Identity Roles

Command:- az role assignment create –role Reader –assignee <Principal ID of Managed identity> –scope <KeyVault Resource ID>

Set policy to access keys in your Key Vault

Command:- az keyvault set-policy -n dev-kv –key-permissions get –spn  <Client ID of Managed identity>

Set policy to access secrets in your Key Vault

Command:- az keyvault set-policy -n dev-kv –secret-permissions get –spn <Client ID of Managed identity>

Set policy to access certs in your Key Vault

Command:- az keyvault set-policy -n dev-kv –certificate-permissions get –spn <Client ID of Managed identity>

3. Install the Azure Identity

Save this Kubernetes manifest to a file named aadpodidentity.yaml:

apiVersion: "aadpodidentity.k8s.io/v1"
kind: AzureIdentity
metadata:
name: <a-idname>
spec:
type: 0
ResourceID: /subscriptions/<subid>/resourcegroups/<resourcegroup>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<name>
ClientID: <clientId>

Replace the placeholders with your user identity values. Set type: 0 for user-assigned MSI or type: 1 for Service Principal.

Finally, save your changes to the file, then create the AzureIdentity resource in your cluster:

kubectl apply -f aadpodidentity.yaml

4. Install the Azure Identity Binding

Save this Kubernetes manifest to a file named aadpodidentitybinding.yaml:

apiVersion: "aadpodidentity.k8s.io/v1"
kind: AzureIdentityBinding
metadata:
  name: demo1-azure-identity-binding
spec:
  AzureIdentity: <a-idname>
  Selector: <label value to match>

Replace the placeholders with your values. Ensure that the AzureIdentity name matches the one in aadpodidentity.yaml.

Finally, save your changes to the file, then create the AzureIdentityBinding resource in your cluster:

kubectl apply -f aadpodidentitybinding.yaml

Sample Nginx Deployment for accessing key vault secret using Pod Identity

Save this sample nginx pod manifest file named nginx-pod.yaml:

apiVersion: v1
kind: Pod
metadata:
  labels:
    app: nginx-flex-kv-podid
    aadpodidbinding: 
  name: nginx-flex-kv-podid
spec:
  containers:
  - name: nginx-flex-kv-podid
    image: nginx
    volumeMounts:
    - name: test
      mountPath: /kvmnt
      readOnly: true
  volumes:
  - name: test
    flexVolume:
      driver: "azure/kv"
      options:
        usepodidentity: "true"         # [OPTIONAL] if not provided, will default to "false"
        keyvaultname: ""               # the name of the KeyVault
        keyvaultobjectnames: ""        # list of KeyVault object names (semi-colon separated)
        keyvaultobjecttypes: secret    # list of KeyVault object types: secret, key or cert (semi-colon separated)
        keyvaultobjectversions: ""     # [OPTIONAL] list of KeyVault object versions (semi-colon separated), will get latest if empty
        resourcegroup: ""              # the resource group of the KeyVault
        subscriptionid: ""             # the subscription ID of the KeyVault
        tenantid: ""            # the tenant ID of the KeyVault
Azure AD Pod Identity points to remember when implementing in cluster
  • Azure AD Pod Identity is currently bound to the default namespace. Deploying an Azure Identity and it’s binding to other namespaces, will not work!
  • Pods from all namespaces can be executed in the context of an Azure Identity deployed to the default namespace (related to point 1)
  • Every Pod Developer can add the aadpodidbinding label to his/her pod and use your Azure Identity
  • Azure Identity Binding is not using default Kubernetes label selection mechanism

There is little doubt that data will guide the next generation of business strategy and will bring new efficiencies across industries. But for that to happen, organizations must be able to extract insights from their data.

Qubole is an ideal platform to activate end-to-end data processing in organizations. It combines all types of data – structured, unstructured, and legacy offline data – into a single data pipeline and turns it into rich insights by adding AI, ML, and deep analytics tools to the mix.

It scales seamlessly to accommodate more users and new data without adding administrative overheads and lowers cloud costs significantly. Simply put, Qubole is a platform that puts big data on the cloud to power business decisions based on real-time analytics.

At CloudIQ Technologies, our data experts have deployed Qubole’s cloud-native data systems for many of our clients, and the results have been outstanding. Here is an article from one of our data engineers that provides an overview of how to setup Qubole to use AWS environment and create and run spark clusters.

AWS Access Configuration:

In order for Qubole to create and run a cluster, we have to grant Qubole access to our AWS environment. We can grant access based on a key or a role. We will use role-based authentication.

Step 1: Login to Qubole

Step 2: Click on the menu at the top left corner and select “Account Settings” under the Control Panel.

Step 3: Scroll down to Access settings

Step 4: Switch Access mode to “IAM Role”

Step 5: Copy the Trusted Principal AWS Account ID and External ID

Step 6: Use the copied values to create a QuboleAccessRole in the AWS account (using the cloudformation template)

Step 7: Copy the Role ARN of the QuboleAccessRole and enter it in the Role ARN field

Step 8: Enter the S3 bucket location where the Qubole metadata will be stored in the “Default Location” field.

Step 9: Click Save

Spark Cluster
Create a cluster

The below steps will help create a new Spark cluster in Qubole.

Step 1: Click on the top-left dropdown menu and select “Cluster”

Step 2: Click on “+New” button

Step 3: Select “Spark” and click “Next”

Step 4: Provide a name for the cluster in the “Cluster Labels” field

Step 5: Select the version of Spark to run, Master Node Type, Worker Node Type, Minimum and Maximum nodes

Step 6: Select Region as us-west-2

Step 7: Select Availability Zone as us-west-2a

Step 8: Click “Next”

Step 9: In the Composition screen, you can select the type of nodes that will be spun up.

Step 10: In the Advanced Configuration screen, proceed to EC2 settings

Step 11: Enter “QuboleDualIAMRole” in the “Instance Profile” field

Step 12: Select “AppVPC” in VPC field

Step 13: Select “AppPrivateSNA” under Subnet field

Step 14: Enter the ip address of the Bastion node in the “Bastion Node” field

Step 15: Scroll to the bottom and enter “AppQuboleClusterSG” (security group for the cluster) in the “Persistent Security Group” field

Step 16: Click on “Create”

Run a cluster

To start a cluster, click on the dropdown menu on the top left corner and select cluster. Now click on “Start” button next to the cluster that needs to be started. A cluster is also automatically started when a job is submitted for the cluster.

Submit a job

One of the simplest ways to run a spark job is to submit it through the workbench. You can navigate to the workbench from the drop-down menu at the top left corner. In the workbench, click on “+Create New”. Then select “Spark” next to the title of the job. Once you select Spark, an optional drop-down appears where you can choose “Python”. In the last drop-down menu, select the spark cluster where you want to execute the job. If this cluster is not active, it will be activated automatically. Enter your spark job in the window below. When complete, click on “Run” to run the job.

Airflow Cluster

Airflow scheduler can be used to run various jobs in a sequence. Let’s take a look at configuring an Airflow cluster in Qubole.

Setting up DataStore

The first step in creating an airflow cluster is to set up a datastore. Make sure that the MySQL db is up and running and contains a database for airflow. Now, select “Explore” from the dropdown menu at the top left corner. On the left hand menu, drop down the selection menu showing “Qubole Hive” and select “+Add Data Store”

In the new screen, provide a name for the data store. Select “MySQL” as the database type. Enter the database name for the airflow database (The database should already be created in MySQL). Enter the host address as “hmklabsbienvironment.cq8z1kp7ikd8.us-west-2.rds.amazonaws.com”. Enter the username and password. Make sure to select “Skip Validation”. Since the MySQL db is in a private VPC, Qubole does not have access to it and will not be able to validate.

Configuring Airflow Cluster

Step 1: Click on the top left drop-down menu and select “Cluster”

Step 2: Click on “+New” button

Step 3: Select “Airflow” in the type of cluster and click “Next”

Step 4: Give a cluster name. Select the airflow version, node type.

Step 5: Select the datastore which points to the MySQL

Step 6: Select the us-west-2 as the Region

Step 7: Select us-west-2a as the Availability zone

Step 8: Click next to go to Advanced Configuration

Step 9: Select AppVPC as the VPC

Step 10: Select AppPrivateSNA as the Subnet

Step 11: Enter the Bastion Node information

Step 12: Scroll to the bottom and enter AppQuboleClusterSG as the Persistent Security Groups

Step 13: Click on create

Once the cluster is created, you can run it by clicking on “Start” next to the cluster’s name.

Containers are being embraced at a breakneck speed – developers love them, and they are great for business because they deliver speed and scale in a cost-efficient manner. So much so, that container technology seems to be overtaking VMs – especially with container orchestration tools like Kubernetes, making them simpler to manage and extracting higher efficiency and speed from them.

Kubernetes cluster architecture

Kubernetes provides an open-source platform for simplifying multi-cloud environments. The disparities between different cloud providers are a roadblock for developers and Kubernetes helps by streamlining and standardizing container-based applications.

Kubernetes clusters are the architectural foundation that drives this simplicity and makes it possible for users to get the functionality they need at scale and with ease. Here are some of the functionalities of Kubernetes –

  • Kubernetes distributes workload efficiently across all open resources and reduces traffic spikes or outages.
  • It simplifies application deployment regardless of the size of the cluster
  • It automates horizontal scaling
  • It monitors against app failure with constant node and container health checks and performs self-healing and replication to resolve any failure issues.

All this makes the work of developers faster and frees up their time and attention from trivial repetitive tasks allowing them to build applications better and faster! For the organization, the benefits are three-fold – higher productivity, better products and, finally, cost efficiencies.

Let’s move to the specifics now and find out how to set up a Kubernetes Cluster on the RHEL 7.6 operating system on AWS.

Prerequisites:
  • You should have a VPC available.
  • A subnet within that VPC, into which you will place your cluster.
  • You should have Security Groups for the Control Plane Load Balancer and the Nodes created.
  • You should have created the Control Plane Load Balancer.
  • A bastion host, or jump box, with a public IP within your VPC from which you can secure shell into your VMs.
  • A pem file for your AWS region, which you will use to secure shell into your VMs.
Creating the IAM Roles

You will need to create 2 IAM roles: one for the Master(s), and one for the worker nodes.

Master Role

To create an IAM role, go to the IAM (Identity and Access Management) page in the AWS console. On the left-hand menu, click ‘Roles’. Then click ‘Create Role’.

Select the service that will use this role. By default, it is EC2, which is what we want. Then click “Next: Permissions”.

Click ‘Create Policy’. The Create Policy page opens in a new tab.

Click on the ‘JSON tab’. Then paste this json into it:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "ec2:*",
                "elasticloadbalancing:*",
                "ecr:GetAuthorizationToken",
                "ecr:BatchCheckLayerAvailability",
                "ecr:GetDownloadUrlForLayer",
                "ecr:GetRepositoryPolicy",
                "ecr:DescribeRepositories",
                "ecr:ListImages",
                "ecr:BatchGetImage",
                "autoscaling:DescribeAutoScalingGroups",
                "autoscaling:UpdateAutoScalingGroup"
            ],
            "Resource": "*",
            "Effect": "Allow"
        }
    ]
}

This json defines the permissions that your master nodes will need.

Click ‘Review Policy’. Then give your policy a name and a description.

Click ‘Create Policy’ and your policy is created!

Back on the Create Role page, refresh your policy list, and filter for the policy you just created. Select it and click ‘Next: Tags’.

You should add 2 tags: Name, with a name for your role, and KubernetesCluster, with the name of the cluster that you are going to create. Click ‘Next: Review’.

Give your role a name and a description. Click ‘Create Role’ and your role is created!

Node Role

For the node role, you will follow similar steps, except that you will use the following json:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "ec2:Describe*",
                "ecr:GetAuthorizationToken",
                "ecr:BatchCheckLayerAvailability",
                "ecr:GetDownloadUrlForLayer",
                "ecr:GetRepositoryPolicy",
                "ecr:DescribeRepositories",
                "ecr:ListImages",
                "ecr:BatchGetImage"
            ],
            "Resource": "*",
            "Effect": "Allow"
        }
    ]
}
Provisioning the VMs
Provisioning the Master

We will use RHEL 7.6 for our cluster because RHEL 8.0 uses iptables v1.8, and kube-proxy does not work well with iptables v1.8. However, kube-proxy works with iptables v1.4, which is installed on RHEL 7.6. We will use the x86_64 architecture.

Log into the AWS console. Go to the EC2 home page and click ‘Launch Instance’. We will search under Community AMIs for our image.

Click ‘Select’. Then choose your instance type. T2.medium should suffice for a Kubernetes master. Click ‘Next: Configure Instance Details’.

We will use only 1 instance. For an HA cluster, you will want more. Select your network and your subnet. For the purposes of this tutorial, we will enable auto-assigning a public IP.  In production, you would probably not want your master to have a public IP.  But you would need to make sure that your subnet is configured correctly with the appropriate NAT and route tables. Select the IAM role you created. Then click ‘Next: Add Storage’.

The default, 10 GB of storage, should be adequate for a Kubernetes master. Click ‘Next: Add Tags’.

We will add 3 tags: Name, with the name of your master; KubernetesCluster, with the name of your cluster; and kubernetes.io/cluster/<name of your cluster>, with the value owned. Click ‘Next: Configure Security Group’.

Select “Select an existing security group” and select the security group you created for your Kubernetes nodes. Click ‘Review and Launch’.

Click ‘Launch’. Select “Choose an Existing Key Pair”. Select the key pair from the drop-down. Check the “I acknowledge” box. You should have the private key file saved on the machine from which you plan to secure shell into your master; otherwise you will not be able to ssh into the master! Click ‘Launch Instances’ and your master is created.

Provisioning the Auto Scaling Group

Your worker nodes should be behind an Auto Scaling group. Under Auto Scaling in the left-hand menu of the AWS console, click ‘Auto Scaling Groups’. Click ‘Create Auto Scaling Group’. On the next page, click ‘Get Started’.

Under “Choose AMI”, select RHEL 7.6 x86_64 under Community AMIs, as you did for the master.

When choosing your instance type, be mindful of what applications you want to run on your Kubernetes cluster and their resource needs. Be sure to provision a size with sufficient CPUs and memory.

Under “Configure Details”, give your autoscaling group a name and select the IAM role you configured for your Kubernetes nodes.

When selecting your storage size, be mindful of the storage requirements of your applications that you want to run on Kubernetes. A database application, for example, would need plenty of storage.

Select the security group that you configured for Kubernetes nodes.

Click ‘Create Launch Configuration’. Then select your key pair as you did for the master. Click ‘Create Launch Configuration’ and you are taken to the ‘Configure Auto Scaling Group Details’ page. Give your group a name. Select a group size. For our purpose, 2 nodes will suffice. Select the same subnet on which you placed your master. Click ‘Next: Configure Scaling policies’.

For this tutorial, we will select “Keep this group at its initial size”. For a production cluster with variability in usage, you may want to use scaling policies to adjust the capacity of the group. Click ‘Next: Configure Notifications’.

We will not add any notifications in this tutorial. Click ‘Next: Configure Tags’.

We will add 3 tags: Name, with the name of your nodes; KubernetesCluster, with the name of your cluster; and kubernetes.io/cluster/<your cluster name>, with the value owned. Click ‘Review’.

Click Create Auto Scaling Group and your auto-scaling group is created!

Installing Kubernetes

Specific steps need to be followed to install Kubernetes. Run the following steps as sudo on your master(s) and worker nodes.

 # add docker repo

yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo

# install container-selinux

 yum install -y http://mirror.centos.org/centos/7/extras/x86_64/Packages/container-selinux-2.107-1.el7_6.noarch.rpm

# install docker

yum install docker-ce

# enable docker

systemctl enable --now docker

# create Kubernetes repo. The 2 urls after gpgkey have to be on 1 line.

cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
EOF

# configure selinux

setenforce 0
sed -i 's/^SELINUX=enforcing$/SELINUX=permissive/' /etc/selinux/config

# install kubelet, kubeadm, kubectl, and Kubernetes-cni. We found that version 1.13.2 works well with RHEL 7.6.

yum install -y kubelet-1.13.2 kubeadm-1.13.2 kubectl-1.13.2 kubernetes-cni-0.6.0-0.x86_64 --disableexcludes=kubernetes –nogpgcheck

# enable kubelet

systemctl enable --now kubelet

# Run the following command as a regular user.

sudo usermod -a -G docker $USER
Creating the Kubernetes Cluster

First, add your master(s) to the control plane load balancer as follows. Log into the AWS console, EC2 service, and on the left-hand menu, under Load Balancing, click ‘Load Balancers’. Select your load balancer and click the Instances tab in the bottom window. Click ‘Edit Instances’.

Select your master(s) and click ‘Save’.

We will create the Kubernetes cluster via a config file. You will need a token, the master’s private DNS name taken from the AWS console, the Load Balancer’s IP, and the Load Balancer’s DNS name. You can generate a Kubernetes token by running the following command on a machine on which you have installed kubeadm:

kubeadm token generate

To get the load balancer’s IP, you must execute a dig command. You install dig by running the following command as sudo:

yum install bind-utils

Then you execute the following command:

dig +short <load balancer dns>

Then you create the following yaml file:

 ---
 apiVersion: kubeadm.k8s.io/v1beta1
 kind: InitConfiguration
 bootstrapTokens:
 - groups:
   - "system:bootstrappers:kubeadm:default-node-token"
   token: "<token>"
   ttl: "0s"
   usages:
   - signing
   - authentication
 nodeRegistration:
   name: "<master private dns>"
   kubeletExtraArgs:
     cloud-provider: "aws"
 ---
 apiVersion: kubeadm.k8s.io/v1beta1
 kind: ClusterConfiguration
 kubernetesVersion: "v1.13.2"
 apiServer:
   timeoutForControlPlane: 10m0s
   certSANs:
   - "<Load balancer IPV4>"
   extraArgs:
     cloud-provider: "aws"
 clusterName: kubernetes
 controlPlaneEndpoint: "<load balancer DNS>:6443"
 controllerManager:
   extraArgs:
     cloud-provider: "aws"
     allocate-node-cidrs: "false"
 dcns:
   type: CoreDNS 

You then bootstrap the cluster with the following command as sudo:

kubeadm init --config kubeadm.yaml --ignore-preflight-errors=all

I had a timeout error on the first attempt, but the command ran successfully the second time. Make a note of the output because you will need it to configure the nodes.

You then configure kubectl as follows:

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

After this there are some components that need to be installed on Kubernetes on AWS:

# Grant the “admin” user complete access to the cluster

kubectl create clusterrolebinding admin-cluster-binding --clusterrole=cluster-admin --user=admin

# Add-on for networking providers, so pods can communicate. 
# Currently either calico.yaml or weave.yaml

kubectl apply -f https://aws-quickstart.s3.amazonaws.com/quickstart-vmware/scripts/weave.yaml

# Install the Kubernetes dashboard

kubectl apply -f https://aws-quickstart.s3.amazonaws.com/quickstart-vmware/scripts/dashboard.yaml

# Install the default StorageClass

kubectl apply -f https://aws-quickstart.s3.amazonaws.com/quickstart-vmware/scripts/default.storageclass.yaml

# Set up the network policy blocking the AWS metadata endpoint from the default namespace.

kubectl apply -f https://aws-quickstart.s3.amazonaws.com/quickstart-vmware/scripts/network-policy.yaml

Then you have to configure kubelet arguments:

sudo vi /var/lib/kubelet/kubeadm-flags.env

And add the following parameters:

--cloud-provider=aws --hostname-override=<the node name>

After editing the kubeadm-flags.env file:

sudo systemctl restart kubelet

Finally, you have to label your master with the provider ID. That way, any load balancers you create for this node will automatically add the node as an AWS instance:

kubectl patch node <node name> -p '{"spec":{"providerID":"aws:///<availability zone>/<instance ID>"}}'

You can join worker nodes to the cluster by running the following command as sudo, which should have been printed out after running kubeadm init on the master:

kubeadm join <load balancer dns>:6443 --token <token> --discovery-token-ca-cert-hash <discovery token ca cert hash> --ignore-preflight-errors=all

Be sure to configure kubelet arguments on each node and patch them using kubectl as you did for the master.

Your Kubernetes cluster on AWS is now ready!

As one of the most popular cloud platforms, Microsoft Azure is the backbone of thousands of businesses – 80% of the Fortune 500 companies are on Microsoft cloud, and Azure holds 31% of the global cloud market! 

Microsoft’s customer-centricity shines through the entire Azure stack, and a critical part of it is the Azure Alerts that allows you to monitor the metrics and log data for the whole stack across your infrastructure, application, and Azure platform.

Azure Alerts offers organizations and IT managers, access to faster alerts and a unified monitoring platform. Once set up, the software requires minimal technical effort and gives the IT team a centralized monitoring experience through a single dashboard that manages ALL the alerts.

The platform is designed to provide low latency log alerts and metric alerts which gives IT managers the opportunity to identify and fix production and performance issues almost in real-time. Naturally, in complex IT environments, this level of control and overview of the IT infrastructure leads to higher productivity and reduced costs.

Here are more details of how Azure Alerts work

Alerts proactively notify us when important conditions are found in your monitoring data. They allow us to identify and address issues before the users notice them.

This diagram represents the flow of alerts

Alerts can be created from

  • Metric values of resources
  • Log search queries results
  • Activity log events
  • Health of the underlying Azure platform

This is what a typical alert dashboard for a single/multiple subscriptions looks like

You can see 5 entities on the dashboard
  • Severity
    • Defines how severe the alert is and how quickly action needs to be taken.
  • Total alerts
    • Total number of alerts received aggregated by the severity of the alert.
  • New
    • The issue has just been detected and hasn’t yet been reviewed.
  • Acknowledged
    • An administrator has reviewed the alert and started working on it.
  • Closed
    • The issue has been resolved. After an alert has been closed, you can reopen it by changing it to another state.

We will now take you through the steps to create Metric Alerts, Log Search Query Alerts, Activity Log Alerts, and Service Health Alerts.

STEPS TO CREATE A METRIC ALERT

Go to Azure monitor. Click ‘alerts’ found on the left side.

To create a new alert, click on the ‘+ New alert rule’.

After clicking ‘+ New alert rule’ this window will appear.

To select a resource, click ‘select’. It will display this window where you can select the resource by filtering the subscription, and resource type and the location of the resource. Then select ‘Done’ in the bottom.

Once the resource is selected, now configure the condition. Click ‘select’ to configure the signal. The signal type will show both metrics and activity log for the selected resource.

Select the signal for which you need to create the alert, after selecting the signal, a new consecutive window is displayed, where you need to describe the alert logic.

Set the threshold sensitivity above which you need to trigger the alert. Setting the threshold sensitivity is applicable for static threshold only.

For dynamic threshold, the value is determined by continuously learning the data of the metric series and trying to model it using a set of algorithms and methods. It detects patterns in the data such as seasonality (Hourly / Daily / Weekly) and can handle noisy metrics (such as machine CPU or memory) as well as metrics with low dispersion (such as availability and error rate).

Now select an ‘action group’ if you already have one or create a new action group.

  • Provide a name for the action group.
  • Select the subscription and resource group where the action group needs to be deployed.
  • If you have selected the action type as Email/SMS/Push/Voice, that will display another window to configure the necessary details like email ID, contact number for SMS and voice notifications, etc., provide the information and select OK.
  • You can see the different action types available in the image below.

Input the alert details, alert rule name, description of the alert, and severity of the rule. Select ‘enable rule upon creation’.

Finally click ‘create alert rule’. It might take some time to create the alert and for it to start working.

HOW TO CREATE LOG SEARCH QUERY ALERTS

Repeat steps 1 to 3 as outlined in the Metric alert creation. In step 3 select the resource type as “log analytics workspace”.

Now select the condition, you can choose the “Log (saved query)” or select “Custom log search”.

Select the signal name as per your requirements; a new signal window will be displayed containing the attributes corresponding to the selected signal.

Here we have selected a saved query, which provides the result shown as above.

  1. Rule created based on “Number of results” and the threshold provided
  2. Metric measurement and the threshold provided. The Trigger alert based on,
    • Total breaches or
    • Continuous breaches of the threshold provided in the metric measurement.

Provide the evaluation based on time and the frequency in minutes where the alert rule needs to be monitored.

Follow the steps 8 and 9 as outlined in the Metric alert creation.

STEPS TO CREATE ACTIVITY LOG ALERTS

Repeat steps 1 to 3 as outlined in the Metric alert creation. In step 3 select the resource type as “log analytics workspace”.

On selecting the condition, click ‘Monitor Service’ and select the activity log-Administrative.

Here we have selected, all administrative operations as the signal.

Now configure the alert logic. The event level has many types. Select as per your requirement and click Done in the bottom.

Follow the steps 8 and 9 as outlined in the Metric alert creation.

STEPS TO CREATE A SERVICE HEALTH ALERT

You can receive an alert when Azure sends service health notifications to your Azure subscription. You can configure the alert based on:

  • The class of service health notification (Service issues, Planned maintenance, Health advisories).
  • The subscription affected.
  • The service(s) affected.
  • The region(s) affected.

Login into Azure portal, search for service health if it is on the left side. Click service health.

You can see the service health service is now visible and select the “Health alerts” in the alerts section.

Select Create service health alert and fill in the fields.

Select the subscription and services for which you need to be alerted.

Select the ‘region’ where your resources are located and select the ‘Event type’. Azure provides the following event types,

Select “all the event types” so that you can receive alerts irrespective of the event type.

Follow the steps 8 and 9 as outlined in the Metric alert creation. Then click on ‘Create Alert rule’. The service health alert can be seen in the Health Alerts section.

Kubernetes is the reigning market leader when it comes to container orchestration! Any organization working with the container ecosystem is either already using Kubernetes or considering it. However, despite the undoubted ease and speed Kubernetes bring to the container ecosystem, they also need specialized expertise to deploy and manage.

Many organizations consider the DIY approach to Kubernetes and if you have an in-house IT team with the requisite experience or if your requirements are large enough to justify the cost of hiring a dedicated Kubernetes team – then an internal Kubernetes strategy could certainly be beneficial.

However, if you don’t fall in the category mentioned above, then managed Kubernetes is the smartest and most cost-effective way ahead. With professionals in the picture, you can be assured of getting long term strategy, seamless implementation, and dedicated on-going service, which will

  • reduce deployment time
  • provide 24×7 support
  • handle all upgrades and fixes
  • troubleshoot as and when needed

Kubernetes solution providers offer a wide range of services – from fully managed to bare bone implementation to preconfigured Kubernetes environments on SaaS models to training for your in-house staff.

Look at your operational needs and your budget and explore the market for Kubernetes services options before you pick the service and the digital partner that ticks all your boxes.   

Meanwhile, do look at our tutorial on troubleshooting Kubernetes deployments.

Kubernetes deployments issues are not always easy to troubleshoot. In some cases, the errors can be resolved easily, and, in some cases, detecting errors requires us to dig deeper and run various commands to identify and resolve the issues.

The first step is to list down all pods after installing your application. The following command lists down all pods in all namespaces.

kubectl get pods -A

If you find any issues on the pod status, you can then use kubectl describe, kubectl logs, kubectl exec commands to get more detailed information.

Debugging Pods
Pod Status Shows ImagePullBackOff or ErrImagePull

This status indicates that your pod could not run because the pod could not pull the image from the container registry. To confirm this, run the kubectl describe command along with the pod identifier to display the details.

kubectl describe pod <pod-identifier>

This command will provide more information about the issue.

  • Image name or tag incorrect.
    • Check the image name and tag and try to pull the image manually on the host using docker pull to verify.
  • Authentication failure related to Container registry.
    • Check the secrets, roles, service principal related to your container registry and try to pull the image manually on the host using docker pull to verify.
docker pull <image-name:tag> 
Pod Status Shows Waiting

This status indicates your pod has been scheduled to a worker node, but it can’t run on that machine. To confirm this, run the kubectl describe command along with the pod identifier to display the details.

kubectl describe pod <pod-identifier> -n <namespace>

The most common causes related to this issue are

  • Image name or tag incorrect.
    • Check the image name and tag and try to pull the image manually on the host using docker pull to verify.
  • Authentication failure related to Container registry.
    • Check the secrets, roles, service principal related to your container registry and try to pull the image manually on the host using docker pull to verify.
Pod Status Shows Pending or CrashLoopBackOff

This status indicates your pod could not be scheduled on a node for various reasons like resource constraints (insufficient CPU or memory resources), volume mounting issues.  To confirm this, run the kubectl describe command along with the pod identifier to display the details.

kubectl describe pod <pod-identifier> -n <namespace>

This command will provide more information about the issue. Most common issues are

  • Insufficient resources
    • If resources are insufficient, clean up your existing resources or scaling your nodes (vertically or horizontally) to increase the resources.
  • Volume mounting
    • Check your volume’s mounting definition and storage classes.
  • Using hostPort
    • When you bind a Pod to a hostPort, there are a limited number of places that a pod can be scheduled. In most cases, hostPort is unnecessary, try using a Service object to expose your pod. If you do require hostPort, then you can only schedule as many Pods as there are nodes in your Kubernetes cluster
Pod is crashing or unhealthy

Sometimes the scheduled pods are crashing or unhealthy.  Run kubectl logs to find the root cause.

kubectl logs <pod_identifier> -n <namespace>

If you have multiple containers, run the following command to find the root cause.

kubectl logs <pod_identifier> -c <container_name> -n <namespace>

If your container has previously crashed, you can access the previous container’s crash log with:

kubectl logs –previous <pod_identifier> -c <container_name> -n <namespace>

If your pod is running but with 0/1 ready state or 0/2 ready state (in case if you have multiple containers in your pod), then you need to verify the readiness. Check the health check (readiness probe) in this case.

Most common issues are

  • Application issues
    • Run the below command to check the logs.
               kubectl logs <pod_identifier> -c <container_name> -n <namespace>
  • Run the below command to verify the events.
               kubectl describe <pod_identifier> -n <namespace>
  • Readiness probe health check failed
    • Check the health check (readiness probe) in this case. Also, check the READY column of the kubectl get pods output to find out if the readiness probe is executing positively.
    • Run the below command to check the logs.
         kubectl logs <pod_identifier> -c <container_name> -n <namespace>
  • Run the below command to verify the events.
         kubectl describe <pod_identifier> -n <namespace>
  • Liveness probe health check failed
    • Check the health check (liveness probe) in this case. Also, check the RESTARTS column of the kubectl get pods output. To find out if the liveness probe is executing positively.
    • Run the below command to check the logs.
         kubectl logs <pod_identifier> -c <container_name> -n <namespace>
  • Run the below command to verify the events.
         kubectl describe <pod_identifier> -n <namespace>
Pod is running but has application issues

In some cases, the pods are running, but the output of the application is incorrect. In this case, you should run the following to find the root cause.

  • Run the below command and identify the issue.
kubectl logs <pod_identifier> -c <container_name> -n <namespace>
  • If you are interested in the last n lines of logs run
kubectl logs <pod_identifier> -c <container_name> --tail <n-lines> -n <namespace>
  • Run the commands inside the container using
kubectl exec <pod_identifier> -c <container_name> /bin/bash -n <namespace>

Run the commands like ‘curl’ or ‘ps’ ‘ls’ to troubleshoot the issue after you get into the container.

Pod is running and working but cannot access through services

In some cases, the pods are working as expected but cannot access through the services. Most common causes of this issue are

  • Service not registered properly
    • Check that the service exists and describe the service and validate the pod selectors to run the following commands.
kubectl get svc
kubectl describe svc <svc-name>
kubectl get endpoints
  • Run the following commands to verify pod selector
kubectl get pods --selector=name={name},{label-name}={label-value}
  • The service may be deployed in a different namespace.
    • Verify that the pod’s containerPort matches up with the Service’s containerPort
  • Service is registered properly but has a DNS issue
    • Get into the container using exec command and run nslookup using the following command
kubectl get endpoints
kubectl exec <pod_identifier> -c <container_name> /bin/bash
nslookup <service-name>
  • If you have any issues to run the command for curl or nslookup. Deploy debugging pod using image yauritux/busybox-curl in the same namespace to verify. Please run the following commands to verify
kubectl run --generator=run-pod/v1 -it --rm <name> --image=yauritux/busybox-curl -n <namespace>
  • Run the following to verify within the container
curl http://<servicename>
telnet <service-ip> <service-port>
nslookup <servicename>

On July 21, 2015, when Kubernetes v1.0 was released, it redefined the container technology landscape. All the bottlenecks of application deployment, scaling, and management in containers was made simpler and faster with intelligent automation.

Container technology made software development more agile and brought in resource efficiency – they made scaling smoother and faster. However, they also need to be tracked, monitored, and managed, which is where container orchestration and Kubernetes come in.

Kubernetes is an open-source container-orchestration system for automating application deployment, scaling, and management.  It was originally designed by Google and is now maintained by the Cloud Native Computing Foundation.

What does Kubernetes do?

Kubernetes allows you to leverage the full potential of your container ecosystem. With automation, it streamlines container workflow and frees up the IT team to concentrate on their core areas of application development by removing the need to manage container networking, storage, logs, alerting, etc. Overall, it automates deploying, scaling, and managing of containerized applications on a cluster of servers.

Key Benefits of Kubernetes

Flexibility for scaling – it enables horizontal infrastructure scaling by quickly adding or removing new severs. Kubernetes has the option of automating vertical scaling, too, by taking into account application provided metrics.

Health check and self-healing designed in Kubernetes allow it to maintain high availability of applications and infrastructure.

Enhanced deployment speed – with automated rollouts and rollbacks, canary deployments, and wide-ranging support for a variety of programming languages, Kubernetes speeds up the process of building, testing, and deploying new software.

Let’s understand more about Kubernetes concepts

1. Kubernetes Objects

Kubernetes contains several abstractions that represent the state of our system: deployed containerized applications and workloads, their associated network and disk resources, and other information about what our Kubernetes cluster is doing. These abstractions are represented by objects in the Kubernetes API.  The basic Kubernetes objects include:

  • Pod
  • Service
  • Volume
  • Namespace

In this blog, we will look at the Pod and Service objects.

2. Pods

A pod is a higher level of abstraction grouping containerized components.  A pod consists of one or more containers that are guaranteed to be co-located on the host machine and can share resources.  The basic scheduling unit in Kubernetes is a pod.  The host machines on which the pods are scheduled are called Nodes.

3. Pod definition yaml

Kubernetes objects are mostly created by declaring their configuration in a yaml file.
Given below is yaml file to define a simple pod.

apiVersion: v1 
kind: Pod
metadata: 
name: nginx 
labels: 
name: nginx 
spec: 
containers: 
- name: nginx 
image: nginx 
ports: 
- containerPort: 8080

In the above yaml file,

  1. apiVersion – denotes which version of the Kubernetes API we are using to create this object.
  2. kind – specifies what kind of object we want to create.  For Pod object, the apviVersion is always v1.
  3. metadata – has data to uniquely identify the object (name) and labels.
  4. Labels are key/value pairs that are attached to objects.  Labels are intended to be used to specify identifying attributes of objects that are meaningful and relevant to users but do not directly imply semantics to the core system.  So, in the above example, instead of “name: nginx” we can have “appname: nginx”, “name: mynginxapp” or anything we like.
  5. Spec – defines the object specification and differs for each object type.  For Pod object, the spec has an array of containers since a pod consists of one or more containers.
  6. For each container, we provide below attributes:
  • name – name of the container.  This can be different from name of pod and is not related to it.
  • Image – name of the docker image to be used to build this container
  • ports – the ports in this container to be exposed outside the pod.  Here we are running the nginx web-server on port 8080 and exposing it.

Suppose we have the above pod definition in a file named pod-definition.yaml, the pod is created by executing the below Kubernetes command:

$ kubectl create -f pod-definition.yaml

4. Pod communication and need for services

Each pod in Kubernetes is assigned a unique Pod IP address within the cluster, which allows applications to use ports without the risk of conflict. 

Within the pod, all containers can reference each other on localhost, but a container within one pod has no way of directly addressing another container within another pod; for that, it must use the Pod IP Address.

An application developer should never use the Pod IP Address though, to reference / invoke a capability in another pod, as Pod IP addresses are ephemeral – the specific pod that they are referencing may be assigned to another Pod IP address on restart.  Instead, we should use a reference to a Service, which holds a reference to the target pod at the specific Pod IP Address.

5. Services

In Kubernetes, a Service is an abstraction that defines a logical set of Pods and a policy by which to access them. The set of Pods targeted by a Service is usually determined by a selector.

Sample YAML for a service to expose the pod(s) which we created earlier is given below:

apiVersion: v1
 kind: Service
 metadata:
   name: my-service
 spec:
   selector:
     name: nginx
   ports:
     - protocol: TCP
       port: 80
       targetPort: 8080
       nodePort: 30230
   type: NodePort 

This yaml file defines a service named my-service which is used to access the Pods which have a label ‘name: nginx’. 

  1. The selector field of the service must match the label field of the Pods to which we want to connect.
  2. There are 3 ports defined in above YAML file:
  • Port is the port number that makes a service visible to other services running within the same Kubernetes cluster.
  • Target Port is the port on the POD where the service is running.  This is an optional field; if not provided, Kubernetes assigns the same value as Port field
  • Node port is the port on which the service can be accessed by external users.  NodePort can only have values from 30000 to 32767.  If this optional field is not provided in the definition, Kubernetes automatically assigns a value for NodePort service.

To create the service object, enter the above yaml code in a file named service-defn.yaml and execute the command given below:

$ kubectl create -f service-defn.yaml

6. Types of Services

In the above example, we have type: Nodeport for the service.  The different values allowed for the type field are:

  1. ClusterIP: Exposes the Service on a cluster-internal IP. Choosing this value makes the Service only reachable from within the cluster. This is the default Service type.
  2. NodePort: Exposes the Service on each Node’s IP at a static port (the NodePort).   A ClusterIP Service, to which the NodePort Service routes, is automatically created.  We will be able to contact the NodePort Service, from outside the cluster, by requesting <NodeIP>:<NodePort>
  3. LoadBalancer: Exposes the Service externally using a cloud provider’s load balancer. NodePort and ClusterIP Services, to which the external load balancer routes, are automatically created.
  4. ExternalName: Maps the Service to the contents of the externalName field (e.g., foo.bar.example.com), by returning a CNAME record with its value.  No proxying of any kind is set up.  We need CoreDNS version 1.7 or higher to use the ExternalName type.

Many of you are running your mission-critical applications on containers, and if you haven’t already deployed Kubernetes to manage your container ecosystem, then chances are you soon will.

If you are considering a Kubernetes implementation, then there are several ways to go about it –

  • In-house Kubernetes deployment – if you have a large enough IT team with the requisite expertise in Kubernetes architecture and deployment, then getting your Kubernetes cluster up and running in-house is certainly a possibility. Kubernetes deployment is a complex process and requires a mix of specific skill sets. Also running and monitoring a Kubernetes platform requires the full-time services of a dedicated team, and your requirement must justify this additional cost.
  • SaaS Solutions for Kubernetes– if your business needs are specific and straightforward, then you can explore the market for pre-designed Kubernetes offerings on a SaaS payment model.
  • Fully outsourced (managed) Kubernetes services – if budget permits and your business demands, then bringing in the professionals is a safe and hassle-free solution. From infrastructure assessments to building a Kubernetes strategy to engineering, deploying, and managing enterprise-wide Kubernetes solutions – you can outsource your entire project to experts.
  • Many service providers like CloudIQ also offer day-to-day management and support as well as Kubernetes training to your IT staff to set up internal management expertise.

If the last decade of cloud has taught us anything, it is that when it comes to technology, bringing in professionals to do the job always turns out to be the best option in the long run. Kubernetes is a sophisticated platform that requires specialized competencies. Here is a look at one of our tutorials on Kubernetes Networking – how it all works under the hood.

KUBERNETES NETWORKING – DATA PLANE

In Kubernetes, applications run as a set of pods with their own IP address and port. Kubernetes provides an abstract way to expose the applications/pods as a network service. Various forms of the service abstractions include ClusterIP, NodePort, Load Balancer & Ingress. When service requests enters Kubernetes cluster, the service abstractions have to be directed to individual service endpoints of Pods. This data plane function is implemented using a Linux  Kernel feature – iptables.

Iptables is used to set up, maintain, and inspect the tables of IP packet filter rules in the Linux kernel. Several different tables may be defined. Each table contains a number of built-in chains and may also contain user-defined chains. Each chain is a list of rules which can match a set of packets. Each rule specifies what to do with a packet that matches. This is called a ‘target’, which may be a jump (-j) to a user-defined chain in the same table.

The service(SVC) to service endpoints(SEP) are programmed using KUBE-SERVICES user-defined chains in the NAT(Network Address Translation) table. The contents of the iptables can be extracted using “iptables-save” command

# Generated by iptables-save v1.6.0 on Mon Sep 16 08:00:17 2019
*nat
:PREROUTING ACCEPT [1:52]
:INPUT ACCEPT [0:0]
:OUTPUT ACCEPT [23:1438]
:POSTROUTING ACCEPT [10:592]
:DOCKER - [0:0]
:IP-MASQ-AGENT - [0:0]

:KUBE-SERVICES - [0:0]

-A PREROUTING -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A OUTPUT -m comment --comment "kubernetes service portals" -j KUBE-SERVICES

Let’s consider the Services in the following example.

cloudiq@hubandspoke:~$ kubectl get svc –namespace=workshop-development

NAMEciq-ingress-workshop-development-nginx-ingress-controller
TYPELoadBalancer
CLUSTER-IPhttp://192.168.5.65
EXTERNAL-IPhttp://10.82.0.97
PORT(S)80:30512/TCP,443:31512/TCP
AGE19h

Here we have the following service abstractions that are defined.

LoadBalancerIP=10.82.0.97

NodePort=30512/31512

ClusterIP=192.168.5.65

The above services have to be translated to individual service endpoints. The rules performing matching and translation are programmed using custom chains in the NAT table of Ip Tables as below.

Lets look for LoadBalancer=10.82.0.97 service

cloudiq@hubandspoke:~$ cat ciq-dev-aks-iptables-save.output | grep 10.82.0.97
-A KUBE-SERVICES -d 10.82.0.97/32 -p tcp -m comment --comment "workshop-development/ciq-ingress-workshop-development-nginx-ingress-controller:http loadbalancer IP" -m tcp --dport 80 -j KUBE-FW-SXB4UOYSLPHVISJM
-A KUBE-SERVICES -d 10.82.0.97/32 -p tcp -m comment --comment "workshop-development/ciq-ingress-workshop-development-nginx-ingress-controller:https loadbalancer IP" -m tcp --dport 443 -j KUBE-FW-JLRSZDR3OXJ4SUA2

Let’s look at the HTTPS service available on port 443.

cloudiq@hubandspoke:~$ cat ciq-dev-aks-iptables-save.output | grep KUBE-FW-JLRSZDR3OXJ4SUA2

cloudiq@hubandspoke:~$ cat ciq-dev-aks-iptables-save.output | grep KUBE-FW-JLRSZDR3OXJ4SUA2

:KUBE-FW-JLRSZDR3OXJ4SUA2 - [0:0]
-A KUBE-FW-JLRSZDR3OXJ4SUA2 -m comment --comment "workshop-development/ciq-ingress-workshop-development-nginx-ingress-controller:https loadbalancer IP" -j KUBE-MARK-MASQ
-A KUBE-FW-JLRSZDR3OXJ4SUA2 -m comment --comment "workshop-development/ciq-ingress-workshop-development-nginx-ingress-controller:https loadbalancer IP" -j KUBE-SVC-JLRSZDR3OXJ4SUA2
-A KUBE-FW-JLRSZDR3OXJ4SUA2 -m comment --comment "workshop-development/ciq-ingress-workshop-development-nginx-ingress-controller:https loadbalancer IP" -j KUBE-MARK-DROP
-A KUBE-SERVICES -d 10.82.0.97/32 -p tcp -m comment --comment "workshop-development/ciq-ingress-workshop-development-nginx-ingress-controller:https loadbalancer IP" -m tcp --dport 443 -j KUBE-FW-JLRSZDR3OXJ4SUA2

We see below NodePort & Cluster IP translation. The service chains SVC point to two different service endpoints. In order to select between the two service endpoints, a random probability measure is calculated, and appropriate SEP service endpoints are selected.

cloudiq@hubandspoke:~$ cat ciq-dev-aks-iptables-save.output | grep KUBE-SVC-JLRSZDR3OXJ4SUA2

:KUBE-SVC-JLRSZDR3OXJ4SUA2 - [0:0]
-A KUBE-FW-JLRSZDR3OXJ4SUA2 -m comment --comment "workshop-development/ciq-ingress-workshop-development-nginx-ingress-controller:https loadbalancer IP" -j KUBE-SVC-JLRSZDR3OXJ4SUA2
-A KUBE-NODEPORTS -p tcp -m comment --comment "workshop-development/ciq-ingress-workshop-development-nginx-ingress-controller:https" -m tcp --dport 31512 -j KUBE-SVC-JLRSZDR3OXJ4SUA2
-A KUBE-SERVICES -d 192.168.5.65/32 -p tcp -m comment --comment "workshop-development/ciq-ingress-workshop-development-nginx-ingress-controller:https cluster IP" -m tcp --dport 443 -j KUBE-SVC-JLRSZDR3OXJ4SUA2
-A KUBE-SVC-JLRSZDR3OXJ4SUA2 -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-4R3FOXQSM5T2ZADC
-A KUBE-SVC-JLRSZDR3OXJ4SUA2 -j KUBE-SEP-PI7R3ONIYH4XJLMW

In the SEP service endpoints, the actual DNAT is performed.

cloudiq@hubandspoke:~$ cat ciq-dev-aks-iptables-save.output | grep KUBE-SEP-4R3FOXQSM5T2ZADC
:KUBE-SEP-4R3FOXQSM5T2ZADC - [0:0]
-A KUBE-SEP-4R3FOXQSM5T2ZADC -s 10.82.0.10/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-4R3FOXQSM5T2ZADC -p tcp -m tcp -j DNAT --to-destination 10.82.0.10:443
-A KUBE-SVC-JLRSZDR3OXJ4SUA2 -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-4R3FOXQSM5T2ZADC
cloudiq@hubandspoke:~$ cat ciq-dev-aks-iptables-save.output | grep KUBE-SEP-PI7R3ONIYH4XJLMW
:KUBE-SEP-PI7R3ONIYH4XJLMW - [0:0]
-A KUBE-SEP-PI7R3ONIYH4XJLMW -s 10.82.0.82/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-PI7R3ONIYH4XJLMW -p tcp -m tcp -j DNAT --to-destination 10.82.0.82:443
-A KUBE-SVC-JLRSZDR3OXJ4SUA2 -j KUBE-SEP-PI7R3ONIYH4XJLMW

Automation Testing helps complete the entire software testing life cycle (STLC) in less time and improve efficiency of the testing process.

Test Automation enables teams to verify functionality, test for regression and run simultaneous tests efficiently. In this article we will take a detailed look at the Automation Testing Tools available, standards and best practices to be followed during Test Automation.

Following the best practices for Software Testing Life Cycle (Unit testing, Integration Testing & System Testing) ensures that the client gets the software as intended without any bugs. End-to-end testing is the methodology used to test whether the flow of an application is performing as designed from start to finish. Carrying out end-to-end tests helps identify system dependencies and ensure the flow of right information across various system components and the system.

Ultimately Automation Testing increases the speed of test execution and the test coverage.

When to Choose Automation Testing
  • There is lots of regression work
  • GUI is same, but you have lot of often functional changes
  • Requirements do not change frequently
  • Load and performance testing with many virtual users
  • Repetitive test cases that tend well to automation & saves time
  • Huge projects
  • Projects that need to test the same areas

Steps to Implement Automation Testing
  • Identify areas within software to automate
  • Choose the appropriate tool for test automation
  • Write test scripts
  • Develop test suits
  • Execute test scripts
  • Build result reports
  • Find possible bugs or performance issues
Choosing your Automation Testing Tool

The strategy to adopt test automation should clearly define when to opt for automation, its scope and selection of the right kind of tools for execution. And when it comes to tools the top ones to go for are

  • Cypress
  • Selenium
  • Protractor
  • Appium(Mobile)
Why Cypress?

Cypress is a JavaScript based testing framework built for the modern web. Cypress helps to create End-to-end tests, Integration tests and Unit tests. Cypress takes a different approach compared to other testing frameworks, since it’s executed in the same run loop as the application. It also leverages a Node.js server to handle any task that needs to happen outside of the browser. With its ability to understand everything happening inside and outside of the browser, it produces more consistent results.

Key Features of Cypress
  • Automatic Waiting – No need for adding wait and sleep.
  • Spies, Stubs, and Clocks – Verify and control the behaviour of functions, server responses, or timers.
  • Network traffic control and monitoring – Easily control, stub, and test edge cases without involving your server. You can stub network traffic however you like.
  • Consistent Results – Cypress architecture doesn’t use Selenium or WebDriver. It is fast, consistent and does reliable tests that are flake-free.
  • Screenshots and Videos – View screenshots taken automatically on failure, or videos of your entire test suite when run from the CLI.
Azure CICD Setup with Cypress

Cypress runs on most of the following CI providers.

Azure DevOps / VSTS CI / TeamFoundation
BitBucket
CircleCI
Docker
GitLab
Jenkins
TravisCI

Azure DevOps – Steps to Integrate Cypress Automation Tests
  • Pre-Build Testing
  • Install the Node module and run application in test mode
  • Run the tests
  • Publish the test results
  • Cypress Containerization
  • Build the docker container of cypress
  • Push the image to container
  • Publish the Build

Before we get started here are the basic Cypress installation commands

Clean up the old results
$ rm -rf cypress/reports/
 
Run the cypress application with required spec file.
$ cypress run –spec \”cypress/integration/**/*.spec.ts\” // mention your spec file
 
Configure the mocha reports path for publishing test results.
–reporter junit –reporter-options ‘mochaFile=cypress/reports/test-output-[hash].xml,toConsole=true’
 
Uninstall the application.
$ npm uninstall cypress-multi-reporters; npm uninstall cypress-promise; npm uninstall cypres

Pre-Build Testing

It is critical to test the application before the Build, Deployment or Release. Essentially the process involves regression and smoke testing. And don’t forget the sanity checks before the build is deployed in the staging environment.

Cypress comes in handy for testing angular / JavaScript applications before they are deployed to staging or production environment.

Install the Node module and run application in test mode

Install the required node module of the application then run the application with test mode.

$ npm install –save-dev start-server-and-test

$ start-server-and-test start http://localhost:4200

Publish the test results

The results of the Cypress test execution are stored in specified path and are added to the Azure DevOps test results. Cypress supports JUnit, Mocha, Mochawsome test results reporter formats and provides options to create customised test results and merge all the test results as well.

Cypress Containerization

Cypress supports docker containerization and that makes it easy to set it up in a cluster environment like AKS. The Cypress base images are available at the link below.

https://github.com/cypress-io/cypress-docker-images

Copy the package.json and UI source code to the app folder and run the Cypress test. The following commands are used to run the docker and execute.

  script: |
        docker run -d -it --name cypressName:cypressImageTag bash
        docker commit -p cypressName:cypressImageTag
        docker stop cypressName
        docker rm -f cypressName
    
    - script: docker tag cypressName:cypressImageTag
      displayName: Tag Cypress image 
      
    
    - task: Docker@1
    displayName: Push image To Registry
    inputs:
        command: push
        azureSubscriptionEndpoint: azureSubscriptionEndpoint
        azureContainerRegistry: $(azureContainerRegistry)
        imageName: acrImageName:BuildId
 
    - script: sudo rm -rf /test-results/*
    displayName: Removing Previous Results
 
    - task: ShellScript@2
    displayName: 'Bash Script - cypress base image post-deployment'
    inputs:
        scriptPath: ./cypress-deployment.sh
        args: $(azureRegistry) $(cypressImageName) $(azureContainerValue) $(CYPRESS_OPTIONS) 
        continueOnError: true
    - task: PublishTestResults@1
    displayName: 'Publish Test Results ./test-results-*.xml'
    inputs:
 
    cypress-base-image-post-deployment.sh
 
    docker run -v $systemSourceDirectory:/app/cypress/reports --name vca-arp-ui 
    $cypress_Latestimage npx cypress run $cypressOptions bash

Now the container should be set up on on your local machine and start running your specs.

Cypress is simple and easily integrates with your CI environment. Apart from the browser support, Cypress reduces the efforts of manual testing and is relatively faster when compared to other automation testing tools.

In this article we will discuss how to create security groups in AWS for Kubernetes. The goal is to set up a Kubernetes cluster on AWS EC2, having provisioned your virtual machines. You are going to need two security groups: one for the control plane load balancer, and another for the VMs.

Creating a Security Group through the AWS Console

Prerequisite: You should have a VPC (virtual private cloud) set up.

Log into the AWS EC2 (or VPC) console. On the left-hand menu, under Network and Security, click Security Groups.

Click on Create Security Group.

Enter a Name and a Description for your Security Group. Then select your VPC from the drop-down menu. Click Add Rule.

You will need 2 TCP ingress rules, one over port 6443, another over port 443. We are choosing to allow the Source from anywhere. In production you may want to restrict the CIDR, IP, or security group that can reach this load balancer.

We are choosing to leave the outbound rules as default, in which all outbound traffic is permitted.

Click Create and your security group is created!

Select your security group in the console. You may want to give your security group a Name (in addition to the Group Name that you specified when creating it).

But you are not done yet: you must add tags to your security group. These tags will alert AWS that this security group is to be used for Kubernetes. Click on the Tags tab at the bottom of the window. Then click Add/Edit Tags.

You will need 2 tags:
  • Name: KubernetesCluster. Value: <the name of your Kubernetes cluster>
  • Name: kubernetes.io/cluster/<the name of your Kubernetes cluster>. Value: owned

Click Save and your tags are saved!

Creating a Security Group for the Virtual Machines

Follow the steps above to create a security group for your virtual machines. Here are the ports that you will need to open for your control plane VMs:

The master node:
  1. 22 for SSH from your bastion host
  2. 6443 for the Kubernetes API Server
  3. 2379-2380 for the ETCD server
  4. 10250 for the Kubelet health check
  5. 10252 for the Kube controller manager
  6. 10255 for the read only kubelet API
The worker nodes:
  1. 22 for SSH
  2. 10250 for the kubelet health check
  3. 30000-32767 for external applications. However, it is more likely that you will expose external applications to outside the cluster via load balancers, and restrict access to these ports to within your vpc.
  4. 10255 for the read only kubelet API

We have chosen to combine the master and the worker rules into one security group for convenience. You may want to separate them into 2 security groups for extra security.

Follow the step-by-step instructions detailed above and you will have successfully created AWS Security Groups for Kubernetes.

What is Synthetic New Relic?:

New Relic Synthetics is a set of automated scriptable tools to monitor the websites, critical business transactions and API endpoints. A detailed individual results from each monitor run can also be viewed. With access to New relic Insights, in-depth queries of data can be run from Synthetics monitors. Creation of custom dashboards are also possible.

Features of Synthetic New Relic:
  • Easy to set up real time instrumentation and analytics
  • REST API functions
  • Real browsers
  • Comparative charting with Browser
  • New Relic Insights support
  • Advanced scripted monitoring
  • Global test coverage
Different types of Synthetic Monitor:

There are four types of monitor.

a) Ping monitor:

Ping monitors are the simplest type of monitor. These monitors are used to check if an application is online. The Synthetics ping monitor uses a simple Java HTTP client to make requests to your site.

b) API tests:

API tests are used to monitor API endpoints. This can ensure that the app server works in addition to the corresponding website. New Relic uses the “http request module” internally to make HTTP calls to API endpoint and validate the results.

c) Browser:

Simple browser monitors essentially are simple, pre-built scripted browser monitors. These monitors make a request to the site using an instance of Google Chrome.

d) Script_Browser:

Scripted browser monitors are used for more sophisticated, customized monitoring. A custom script can be created to navigate to the website, take specific actions and ensure that the specific resources are present.

Creation of Synthetic Monitor:

API Test Monitor:

Step 1:

  • Login to new relic monitor

Step 2 – Create synthetic monitor

  • Click “synthetic” in new relic dashboard after click on the “Add new” in the right up corner.

Step 3: Enter the Required Details

  • Select on “API Test” in monitor type.
  • Enter the monitor name under details
  • Select one location for the monitor under monitoring locations.
  • Set the Schedule – Set frequency for monitoring. For example On selecting frequency as 10 mins, The monitor would run this monitor and check for every 10 mins.
  • Set Notification – Notification to email ids can be set with help of new alert policy or can be appended to existing alert policy. In case of existing alert policy, Click on “Add to an existing alert policy” and the existing policy can be selected. In case of new policy, email address and policy name has to be given. There are three type of policy,
    1. By Policy – Only one open incident at a time for this alert policy.
    2. By Condition – Only one open incident at a time per alert condition
    3. By condition and entity – open an incident every time a condition is violated.
  • Only on completing the above steps, Script can be written by clicking on “Write Your script”
  • Click on “create monitor” after the monitor creation steps done.
PING Monitor:

Step 1:

  • Login to new relic monitor

Step 2 – Create synthetic monitor

  • Click “synthetic” in new relic dashboard after click on the “Add new” in the right up corner.

Step 3: Enter the Required Details

  • Select on “API Test” in monitor type
  • Enter the monitor name under details
  • Enter the URL and enter the response corresponding URL
  • Select one location for the monitor under monitoring locations.
  • Set the Schedule – Set frequency for monitoring. For example On selecting frequency as 10 mins, The monitor would run this monitor and check for every 10 mins.
  • Set Notification – Notification to email ids can be set with help of new alert policy or can be appended to existing alert policy. In case of existing alert policy, Click on “Add to an existing alert policy” and the existing policy can be selected. In case of new policy, email address and policy name has to be given. There are three type of policy,
    1. By Policy – Only one open incident at a time for this alert policy.
    2. By Condition – Only one open incident at a time per alert condition.
    3. By condition and entity – open an incident every time a condition is violated.
  • Only on completing the above steps, Ping monitor gets created when clicking on “ Create Monitor”

Synthetic Monitor Functionality:

API Test:
Pass Scenario:

Below script is used to store the data using Post method, then pass the value to the call back function .Call back function is nothing but it is a function is passed into another function as an argument.

Here, call back function has three arguments like error, response and body.

In the below script, comparing the value “gear” and “10” with JSON body value. Both the values are same. Hence no assertion error is triggered.

In case of value mismatch, an assertion error is thrown.

Failure scenario:

In the below script, the values do not match with the JSON body value. Hence an assertion error is thrown.

In case of assertion error, an alert will be sent to the mail id given in the notification channel. The Assertion error will not be resolved until the Value is made “10”.

Mail Alert: (Ping & API Test)

The error log can be seen as below:

After the error is fixed, an update would be sent to the notification channel

Delete a Monitor: (Ping & API Test)
  • From the Monitors list, select the monitor which needs to delete.
  • In the selected monitor, under settings click on General to view the monitor settings page.
  • Select the trash icon, it will show alert popup and click on “ok” in alert popup then monitor will delete.

Introduction:

Cloud Foundry has a Container-based architecture, open source cloud application platform. It provides the cloud instances and mainly used to deploy the Application directly into cloud environment. Instead of running the app separately, using the CF CLI(Command Line Interface) tool to deploy , test, configure and manage the apps on CF.

Features of Cloud Foundry:
  • An open source Cloud Native Platform
  • Fast and easy to build, test, deploy ,manage& scale apps
  • Works with any language or framework
  • Highly adaptable
  • Can able to see running status of apps
  • Can scale up or down, debug apps on CF
How to interact with CF?
  • Command Line Interface (CLI): from terminal / command prompt
  • IDE plugins
Org and App Space Roles:

CF uses role-based access control, with each role granting permissions in either an organization or an application space.

Organisation :
  • An Organisation or org represents an organisational account and groups together users, resources, applications, and environments.
  • Each organisation has a resource quota and it shares the same resource and domain.
  • Organisations segregate tenants in a Cloud Foundry installation.

To List all orgs that the user has access to the below command can be given in the terminal.

cf orgs
Space:
  • An organisation have separate spaces for development, staging and production versions of the apps.
  • A space can also have its own quota.
  • It has the shared location for developing and running apps
  • Every application and service is scoped to a space

To List all spaces in the current org

cf spaces
Relationship between org, space and Apps:
 
org
space
APP
APP
Service
Instance
Service
Instance
space
APP
APP
Service
Instance
Service
Instance
Before pushing the app into Cloud Foundry, Ensure that:
  • Log into cloud foundary using cf login command
    • cf login -a API-URL
  • It will prompt for username and password, then give the correct credentials
  • Select the org and space where the app gets push.
  • Then push the application using cf push
How to deploy an app into cf?

To deploy an application, need to push its code to the Cloud Foundry instance. The push command is used to push the application on cloud foundary. The arguments may be vary depends on application types. However, it is the best practice to specify all the arguments in a system file called manifest.yml

It provides consistency and reproducibility.An app can specify its service instance dependencies in the manifest.yml file. It will automatically bind to the service instances.

  1. # Start a new app called “myapp”
  2. # If there’s a manifest.yml in the current folder,
  3. # the config will be read from there
  4. cf push
Manifest Format

Manifests has written in YAML. The below manifest illustrates some YAML conventions, as follows:

  • The manifest file begins with three dashes.
  • The applications block begins with a heading followed by a colon.
  • The app name is preceded by a single dash and one space.
  • Subsequent lines in the block are indented two spaces to align with name
Sample manifest.yml

applications:

– name: my-app

memory: 512M

instances: 2

buildpack:nodejs_buildpack

Buildpack:
  • A Cloud Foundry component that resolves app’s runtime dependencies
  • It provides framework and run time support for applications.
  • It is used to determine what dependencies to download
  • It is used to tell how to configure applications to communicate with different services.
  • It is used to compile or prepare the application for launch.
What happens when push an app using cf push?
  • Upload: App files sent to CF
  • Staging:Executable artifact is created (droplet)
  • Running:App starts on an app host

App receives web requests (if it binds to TCP port)

List of cf commands:
cf commandsPurpose
cf targetSets or views the targeted organization or space
cf stopStops an application
cf startStart an app
cf set-envSets an environment variable for an application(cf set-env var_name var_value)
cf servicesLists all of the services that are available in the current space
cf restartStop all instances of the app, then start them again. This causes downtime.
cf restageRecreate the app’s executable artifact using the latest pushed app files and the latest environment (variables, service bindings, buildpack, stack, etc.). This action will cause app downtime.
cf renameRename an app
cf pushDeploys a new application(cf push )
cf marketplaceLists all of the services that are available in the marketplace.
cf logsDisplays the STDOUT and STDERR log streams of an application.(cf logs
cf login -a Log in to CF
cf helpshow help
cf eventsDisplays runtime events that are related to an application.(cf events )
cf deleteDeletes an existing application.(cf delete
cf create-spaceCreates a space.(cf create-space
cf bind-serviceBinds an existing service instance to your application.
cf appsLists all of the applications that you deployed in the current space. The status of each application is also displayed.
cf apiTo view the current API endpoint
cf -vDisplays the version of the Cloud Foundry command line interface.

What is Cross Browser Testing?

Cross Browser Testing is a type of Functional Test to check whether web application works as expected on different browsers.

(Or)

Cross-browser testing is basically running the same set of test cases multiple times on different browsers.

Below two are the most intent of cross-browser testing,

Below two are the most intent of cross-browser testing,
  1. Below two are the most intent of cross-browser testing,
  2. Appearance of the page in different browsers- is it the same, is it different, if one is better than the other, etc

Note: In recent years, testing mobile browsers are included on the Cross-Browser testing scope.

When this testing can be started?

Any testing reaps the best benefits when it is done early on. Therefore, the industry recommendation is to start with it as soon as the page designs are available. Because finding and fixing bugs on early stages are very cost effective. Finding bugs after release or completion of application will not be a cost effective one.

Cross Browser testing through Manual:

Sure, it can be done manually. First, business needs to identify all browsers that the application needs to support. Tester need to run all the testcase against every identified browser and observe whether the appearance and functionality are same.

Through manual testing, it is not possible to cover many browsers and its major versions. So, performing cross browser testing manually will be costly and time-consuming too.

In an Agile world it’s not a good advice to do whole cross browser testing through manual.

Cross Browser testing through Automation:

As stated above, Cross-browser testing is basically running the same set of test cases multiple times on different browsers. This type of repeated task is best suited for automation. Thus, it’s more cost and time effective to perform this testing by using tools.

Selenium for Cross Browser Testing:

Selenium is well known for automated testing of the web-based applications. Just by changing the browser to be used for running the test cases, selenium makes it very easy to run the same test cases multiple times using different browsers.

Note: Rest of this blog we are going to see how Selenium can be used for Cross-Browser Testing.

Advantages of choosing Selenium:
  • Open source
  • Supports programming languages like Java, Perl, Python, C#, Ruby, Groovy, Java Script, etc
  • Platform Independent: Supports (OS) like Windows, Mac, Linux, UNIX, etc.
  • Supports multiple browsers namely, Internet Explorer, Chrome, Firefox, Opera, Safari, etc
  • Ease of implementation
  • Reusability

By using TestNG along with Selenium Grid we can achieve parallel test execution on different browser in different machines. Let’s see TestNG and Selenium Grid on the following topics,

TestNG:

TestNG is an automation testing framework in which NG stands for “Next Generation”. TestNG is inspired from JUnit which uses the annotations (@). Default Selenium tests do not generate a proper format for the test results. Using TestNG we can generate test results.

Why TestNG?
  • Multiple test cases can be grouped easily by converting them into testng.xml file. In which you can make priorities which test case should be executed first.
  • The same test case can be executed multiple times without loops just by using keyword called ‘invocation count.’
  • Using TestNG, you can execute multiple test cases on multiple browsers
  • It can be easily integrated with tools like Maven, Jenkins, etc.
Selenium Grid

Selenium Grid is a part of the Selenium Suite which specialise in running multiple tests across different browsers, operating system and machines. You can connect to it with Selenium Remote by specifying the browser, browser version, and operating system you want

Components of Selenium Grid
Hub:

In Selenium Grid, the HUB is a computer which is the central point where we can load our tests into. Hub also acts as a server because of which it acts as a central point to control the network of Test machines. The Selenium Grid has only one hub and it is the master of the network.

Nodes

In Selenium Grid, a NODE is referred to a Test Machine which opts to connect with the Hub. This test machine will be used by Hub to run tests on. A Grid network can have multiple nodes. A node is supposed to have different platforms i.e. different operating system and browsers. The node does not need the same platform for running as that of hub.

Advantages of Selenium Grid
  • Selenium Grid allows running multiple tests across different web browsers, operating systems, and machines. This ensures compatibility of the application under test across multiple combinations of web browsers, operating system, and hardware architecture
  • It speeds up the test suite completion time as it can run multiple tests in parallel. For example, if we have 10 nodes and we need to execute a test suite of 50 tests then it is going to take 10 times lesser time than a single machine that runs this test suit without Selenium Grid.
Disadvantage of Selenium Grid
  • Extra cost to project as it requires additional machines as Nodes
Grid Code Snippets:

What is Jenkins?

Jenkins is an open source automation tool written in Java with plugins built for Continuous Integration purpose. Jenkins is used to build and test your software projects continuously making it easier for developers to integrate changes to the project, and making it easier for users to obtain a fresh build. It also allows you to continuously deliver your software by integrating with a large number of testing and deployment technologies.

With Jenkins, organizations can accelerate the software development process through automation. Jenkins integrates development life-cycle processes of all kinds, including build, document, test, package, stage, deploy, static analysis and much more.

Jenkins achieves Continuous Integration with the help of plugins. Plugins allows the integration of Various DevOps stages. If you want to integrate a particular tool, you need to install the plugins for that tool. For example: Git, Maven 2 project, Amazon EC2, HTML publisher etc.

Advantages of Jenkins include:

  • It is an open source tool with great community support.
  • It is easy to install.
  • It has 1000+ plugins to ease your work. If a plugin does not exist, you can code it and share with the community.
  • It is free of cost.
  • It is built with Java and hence, it is portable to all the major platforms
What is Continuous Integration?

Continuous Integration is a development practice in which the developers are required to commit changes to the source code in a shared repository several times a day or more frequently. Every commit made in the repository is then built. This allows the teams to detect the problems early. Apart from this, depending on the Continuous Integration tool, there are several other functions like deploying the build application on the test server, providing the concerned teams with the build and test results etc.

Continuous Integration with Jenkins
  • First, a developer commits the code to the source code repository. Meanwhile, the Jenkins server checks the repository at regular intervals for changes.
  • Soon after a commit occurs, the Jenkins server detects the changes that have occurred in the source code repository. Jenkins will pull those changes and will start preparing a new build.
  • If the build fails, then the concerned team will be notified.
  • If built is successful, then Jenkins deploys the built in the test server.
  • After testing, Jenkins generates a feedback and then notifies the developers about the build and test results.
  • It will continue to check the source code repository for changes made in the source code and the whole process keeps on repeating.
Jenkins Distributed Architecture

Jenkins uses a Master-Slave architecture to manage distributed builds. In this architecture, Master and Slave communicate through TCP/IP protocol.

Jenkins Master

Your main Jenkins server is the Master. The Master’s job is to handle:

  • Scheduling build jobs.
  • Dispatching builds to the slaves for the actual execution.
  • Monitor the slaves (possibly taking them online and offline as required).
  • Recording and presenting the build results.
  • A Master instance of Jenkins can also execute build jobs directly.
Jenkins Slave

A Slave is a Java executable that runs on a remote machine. Following are the characteristics of Jenkins Slaves:

  • It hears requests from the Jenkins Master instance.
  • Slaves can run on a variety of operating systems.
  • The job of a Slave is to do as they are told to, which involves executing build jobs dispatched by the Master.
  • You can configure a project to always run on a particular Slave machine, or a particular type of Slave machine, or simply let Jenkins pick the next available Slave.
What is a Jenkins pipeline?

A pipeline is a collection of jobs that brings the software from version control into the hands of the end users by using automation tools. It is a feature used to incorporate continuous delivery in our software development workflow.

Over the years, there have been multiple Jenkins pipeline releases including, Jenkins Build flow, Jenkins Build Pipeline plugin, Jenkins Workflow, etc. What are the key features of these plugins?

  • They represent multiple Jenkins jobs as one whole workflow in the form of a pipeline.
  • What do these pipelines do? These pipelines are a collection of Jenkins jobs which trigger each other in a specified sequence.

Lets look at an example. Suppose I’m developing a small application on Jenkins and I want to build, test and deploy it. To do this, I will allot 3 jobs to perform each process. So, job1 would be for build, job2 would perform tests and job3 for deployment. I can use the Jenkins build pipeline plugin to perform this task. After creating three jobs and chaining them in a sequence, the build plugin will run these jobs as a pipeline.

This approach is effective for deploying small applications. But what happens when there are complex pipelines with several processes (build, test, unit test, integration test, pre-deploy, deploy, monitor) running 100’s of jobs?

The maintenance cost for such a complex pipeline is huge and increases with the number of processes. It also becomes tedious to build and manage such a vast number of jobs. To overcome this issue, a new feature called Jenkins Pipeline Project was introduced.

The key feature of this pipeline is to define the entire deployment flow through code. What does this mean? It means that all the standard jobs defined by Jenkins are manually written as one whole script and they can be stored in a version control system. It basically follows the ‘pipeline as code’ discipline. Instead of building several jobs for each phase, you can now code the entire workflow and put it in a Jenkinsfile. Below is a list of reasons why you should use the Jenkins Pipeline.

Jenkins Pipeline Advantages
  • It models simple to complex pipelines as code by using Groovy DSL (Domain Specific Language)
  • The code is stored in a text file called the Jenkinsfile which can be checked into a SCM (Source Code Management)
  • Improves user interface by incorporating user input within the pipeline
  • It is durable in terms of unplanned restart of the Jenkins master
  • It can restart from saved checkpoints
  • It supports complex pipelines by incorporating conditional loops, fork or join operations and allowing tasks to be performed in parallel
  • It can integrate with several other plugins
What is a Jenkinsfile?

A Jenkinsfile is a text file that stores the entire workflow as code and it can be checked into a SCM on your local system. How is this advantageous? This enables the developers to access, edit and check the code at all times.

The Jenkinsfile is written using the Groovy DSL and it can be created through a text/groovy editor or through the configuration page on the Jenkins instance. It is written based on two syntaxes, namely:

  • Declarative pipeline syntax
  • Scripted pipeline syntax

Declarative pipeline is a relatively new feature that supports the pipeline as code concept. It makes the pipeline code easier to read and write. This code is written in a Jenkinsfile which can be checked into a source control management system such as Git.

Whereas, the scripted pipeline is a traditional way of writing the code. In this pipeline, the Jenkinsfile is written on the Jenkins UI instance. Though both these pipelines are based on the groovy DSL, the scripted pipeline uses stricter groovy based syntaxes because it was the first pipeline to be built on the groovy foundation. Since this Groovy script was not typically desirable to all the users, the declarative pipeline was introduced to offer a simpler and more optioned Groovy syntax.

The declarative pipeline is defined within a block labelled ‘pipeline’ whereas the scripted pipeline is defined within a ‘node’

An example Jenkinsfile looks like this:

pipeline {
environment {
BUILD_SCRIPTS_GIT="http://10.100.100.10:7990/scm/~myname/mypipeline.git"
BUILD_SCRIPTS='mypipeline'
BUILD_HOME='/var/lib/jenkins/workspace'
 }
agent any
stages {
stage('Checkout: Code') {
steps {
sh "mkdir -p $WORKSPACE/repo;\
git config --global user.email '[email protected]';\
git config --global user.name 'myname';\
git config --global push.default simple;\
git clone $BUILD_SCRIPTS_GIT repo/$BUILD_SCRIPTS"
sh "chmod -R +x $WORKSPACE/repo/$BUILD_SCRIPTS"
  }
 }
stage('Yum: Updates') {
steps {
sh "sudo chmod +x $WORKSPACE/repo/$BUILD_SCRIPTS/scripts/update.sh"
sh "sudo $WORKSPACE/repo/$BUILD_SCRIPTS/scripts/update.sh"
   }
  }
 }
post {
always {
cleanWs()
  }
 }
}

The above Jenkins file does the following.

  • sets up environment variables
  • pulls data down from a git repo
  • sets it up in a Jenkins workspace
  • runs a script under scripts/
  • once completes by cleaning up the workspace (successful or not)
Pipeline concepts
  • Pipeline

This is a user defined block which contains all the processes such as build, test, deploy, etc. It is a collection of all the stages in a Jenkinsfile. All the stages and steps are defined within this block. It is the key block for a declarative pipeline syntax.

  • Node

A node is a machine that executes an entire workflow. It is a key part of the scripted pipeline syntax.

There are various mandatory sections which are common to both the declarative and scripted pipelines, such as stages, agent and steps that must be defined within the pipeline. These are explained below:

  • Agent

An agent is a directive that can run multiple builds with only one instance of Jenkins. This feature helps to distribute the workload to different agents and execute several projects within a single Jenkins instance. It instructs Jenkins to allocate an executor for the builds.

A single agent can be specified for an entire pipeline or specific agents can be allotted to execute each stage within a pipeline. Few of the parameters used with agents are:

  • Any

Runs the pipeline/ stage on any available agent.

  • None

This parameter is applied at the root of the pipeline and it indicates that there is no global agent for the entire pipeline and each stage must specify its own agent.

  • Label

Executes the pipeline/stage on the labelled agent.

  • Docker

This parameter uses docker container as an execution environment for the pipeline or a specific stage. In the below example I’m using docker to pull an ubuntu image. This image can now be used as an execution environment to run multiple commands.

  • Stages

This block contains all the work that needs to be carried out. The work is specified in the form of stages. There can be more than one stage within this directive. Each stage performs a specific task. In the following example, I’ve created multiple stages, each performing a specific task.

  • Steps

A series of steps can be defined within a stage block. These steps are carried out in sequence to execute a stage. There must be at least one step within a steps directive. In the following example I’ve implemented an echo command within the build stage. This command is executed as a part of the ‘Build’ stage.

Continuous Integration (CI) is a development practice where developers integrate code into a shared repository frequently, preferably several times a day. Each integration can then be verified by an automated build and automated tests. While automated testing is not strictly part of CI it is typically implied.

One of the key benefits of integrating regularly is that you can detect errors quickly and locate them more easily. As each change introduced is typically small, pinpointing the specific change that introduced a defect can be done quickly.

In recent years CI has become a best practice for software development and is guided by a set of key principles. Among them are revision control, build automation and automated testing.

Benefits and Advantages of Continuous Integration

Continuous Integration has many benefits. A good CI setup speeds up your workflow and encourages the team to push every change without being afraid of breaking anything. There are more benefits to it than just working with a better software release process. Continuous Integration brings great business benefits as well.

  • Reduces the time and effort for integrations of different code changes
  • Enables a quick feedback mechanism on every change
  • Allows earlier detection and prevention of defects
  • Helps collaboration between team members so recent code is always shared
  • Reduces manual testing effort
  • Building features more incrementally saves time on the debugging side so you can focus on adding features
  • First step into fully automating the whole release process
  • Prevents divergence in different branches as they are integrated regularly
Continuous Integration Tools

Jenkins

Jenkins is a cross-platform open source CI tool written in Java. It offers configuration through both the GUI interface and the console commands. Jenkins is a very flexible tool to use because it offers an extension of features through plugins. Its plugin list is very broad, and one can easily add their own plugins to that list. Furthermore, Jenkins can distribute software builds and test loads on several machines.

Travis CI

Travis CI is an open source CI service free for all open source projects hosted on GitHub. Since Travis CI is hosted, it is platform independent. It is configured using Travis.Yml files which contain actionable data. Travis CI supports a variety of software languages, and the build configuration for each of those languages is complete. Travis CI uses virtual machines to create applications.

TeamCity

TeamCity is a Java-based sophisticated CI tool offered by JetBrains. It supports Java,Net and Ruby platforms. TeamCity has a range of free plugins available developed both by JetBrains and third parties. It also offers integration with several IDEs including, Eclipse, IntelliJ IDEA and Visual Studio. Moreover, TeamCity allows simultaneous running of multiple builds and tests in different platforms and environments.

GitLab CI

GitLab CI is hosted on the free hosting service GitLab.com, and it offers Git repository management function with features such as, access control, bug tracking, and code reviewing. GitLab CI is completely unified with GitLab and it can easily be used to link projects using the GitLab API. GitLab CI process builds are coded in the Go language and can execute on several operating systems such as, Windows, Linux, Docker, OSX, and FreeBSD.

CircleCI

CircleCI is a CI tool hosted only on GitHub. It supports several languages, including Java, Python, Ruby/Rails, Node.js, PHP, Skala and Haskell. It offers services based on containers. CircleCI offers one container free, and any number of projects can be built on it. It offers up to five levels of parallelization (1x, 4x, 8x, 12x and 16x). Therefore, maximum parallelization of 16x can be achieved in one build. CircleCI also supports Docker platform.

Bamboo

Bamboo is a CI tool developed by Atlassian. Bamboo is available in two versions, cloud and server. For the cloud version, Atlassian offers hosting service with the help of Amazon EC2 account. For the server version, self-hosting needs to be done. Bamboo supports well known Atlassian products, JIRA and BitBucket.

Machine Learning

Artificial Intelligence

Artificial intelligence (AI) is the simulation of human intelligence processes by machines, especially computer systems. These processes include learning (the acquisition of information and rules for using the information), reasoning (using rules to reach approximate or definite conclusions) and self-correction.

Machine Learning

Machine learning is an application of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed. Machine learning focuses on the development of computer programs that can access data and use it to learn for themselves.

In Traditional Programming, data and program are run on the computer to produce the output. In Machine Learning, data and output are run on the computer to create a program. The program can be used in traditional programming.

Machine learning algorithms are often categorized as supervised or unsupervised.

Supervised Learning

Supervised learning is a learning in which we teach or train the machine using data which is well labelled that means some data is already tagged with correct answer. After that, machine is provided with new set of examples(data) so that supervised learning algorithm analyses the training data (set of training examples) and produces a correct outcome from labelled data.

Classification algorithms and regression algorithms are types of supervised learning. Classification algorithms are used when the outputs are restricted to a limited set of values. For a classification algorithm that filters emails, the input would be an incoming email, and the output would be the name of the folder in which to file the email. For an algorithm that identifies spam emails, the output would be the prediction of either “spam” or “not spam”, represented by the Boolean values true and false. Regression algorithms are named for their continuous outputs, meaning they may have any value within a range. Examples of a continuous value are the temperature, length, or price of an object.

Unsupervised Learning

Unsupervised learning is the training of machine using information that is neither classified nor labelled and allowing the algorithm to act on that information without guidance. Here the task of machine is to group unsorted information according to similarities, patterns and differences without any prior training of data. The most common unsupervised learning method is cluster analysis or clustering, which is used for exploratory data analysis to find hidden patterns or grouping in data.

Some simple Machine Learning algorithms

Linear Regression

Here, we establish a relationship between independent and dependent variables by fitting the best line. It is used to estimate real values (cost of houses, number of calls, total sales, etc.) based on a continuous variable(s).

Below model is used to predict the Ice cream sales based on the temperature in a city.

We need a weight(w) and a bias(b) to fit a straight-line (y = wx + b) and this can be diagrammatically represented as given below:

Above diagram is the simplest Neural Network. A neural network is a system of hardware and/or software patterned after the operation of neurons in the human brain.

Logistic Regression

Logistic Regression is a classification algorithm used to estimate discrete binary values (like 0/1, yes/no, true/false) based on given set of independent variables. Typically, this involves fitting a curve to separate 2 distinct classes of data points.

The neural network for logistic regression has multiple weights / bias as inputs and 2 output nodes as shown below:

Deep Learning

Deep learning is a specific method of machine learning, and it’s based primarily on the use of neural networks.

In traditional supervised machine learning, systems require an expert to use his or her domain knowledge to specify the information (called features) in the input data that will best lead to a well-trained system. In Deep Learning, rather than specifying the features in our data that we think will lead to the best classification accuracy, we let the machine find this information on its own. Often, it can look at the problem in a way that even an expert wouldn’t have been able to imagine.

Neural Network Terminology

Activation function

The activation function of a node defines the output of that node, or “neuron”, given an input or set of inputs. This output is then used as input for the next node and so on until a desired solution to the original problem is found. Some of the commonly used activation functions are given below

Input / Output / Hidden Layers

Simply as the name suggests the input layer is the one which receives the input and is essentially the first layer of the network. The output layer is the one which generates the output or is the final layer of the network. The processing layers are the hidden layers within the network. These hidden layers are the ones which perform specific tasks on the incoming data and pass on the output generated by them to the next layer. The input and output layers are the ones visible to us, while are the intermediate layers are hidden.

Forward propagation

Forward Propagation refers to the movement of the input through the hidden layers to the output layers. In forward propagation, the information travels in a single direction FORWARD. The input layer supplies the input to the hidden layers and then the output is generated. There is no backward movement.

Cost / Loss function

When we build a network, the network tries to predict the output as close as possible to the actual value. We measure this accuracy of the network using the loss function. The loss function tries to penalize the network when it makes errors. Our objective while running the network is to increase our prediction accuracy and to reduce the error, hence minimizing the loss function. The most optimized output is the one with the least value of the loss function. If we define the loss function to be the mean squared error, it can be written as –

C= 1/m ∑ (y – a)2 where m is the number of training inputs, a is the predicted value and y is the actual value of that example.

The learning process revolves around minimizing the cost.

Gradient Descent

Gradient descent is an optimization algorithm for minimizing the cost. To think of it intuitively, while climbing down a hill you should take small steps and walk down instead of just jumping down at once. Therefore, what we do is, if we start from a point x, we move down a little i.e. delta h, and update our position to x-delta h and we keep doing the same till we reach the bottom. Consider bottom to be the minimum cost point.

Mathematically, to find the local minimum of a function one takes steps proportional to the negative of the gradient of the function.

Learning Rate

rate at which we descend towards the minima of the cost function is the learning rate. We should choose the learning rate very carefully since it should neither be very large that the optimal solution is missed and nor should be very low that it takes forever for the network to converge.

Backpropagation

When we define a neural network, we assign random weights and bias values to our nodes. Once we have received the output for a single iteration, we can calculate the error of the network. This error is then fed back to the network along with the gradient of the cost function to update the weights of the network. These weights are then updated so that the errors in the subsequent iterations is reduced. This updating of weights using the gradient of the cost function is known as back-propagation.

Steps in training a Neural Network
  • Initialize weights and biases.
  • ii. Forward propagation: Using the input X, weights W and biases b, for every layer we compute Z and A, the Linear and Non-linear activations. At the final layer, we compute f(A^(L-1)) which could be a sigmoid, softmax or linear function of A^(L-1) and this gives the prediction y_hat.
  • Compute the loss function: This is a function of the actual label y and predicted label y_hat. It captures how far off our predictions are from the actual target. Our objective is to minimize this loss function.
  • Backward Propagation: In this step, we calculate the gradients of the loss function f(y, y_hat) with respect to A, W, and b called dA, dW and db. Using these gradients, we update the values of the parameters from the last layer to the first.
  • Repeat steps 2–4 for n iterations/epochs till we feel we have minimized the loss function, without overfitting the train data
Machine Learning using Python

Simple Machine Learning models like Linear Regression can be trained using the python library scikit-learn. Neural Networks are built and trained using the libraries Keras, TensorFlow or PyTorch.

In below simple example, we are building a linear regression model to predict the ice cream sales based on temperature. 80% of the available data is used for testing and we are using the remaining 20% data for testing our model.

  
import matplotlib.pyplot as plt   
import numpy as np   
from sklearn.linear_model import LinearRegression  
from sklearn.metrics import r2_score  
import pandas as pd  
                  
 # load the dataset   
 Stock_Market = {'Temprature_in_Fahrenheit' :[58, 62, 52, 60, 66, 74, 68, 80, 76, 74, 64,],  
 'Ice_Cream_sales': [215,325,185,332,406,522,412,614,544,44500000,408]          
                        }  
                  
 df = pd.DataFrame(Stock_Market,columns=['Temprature_in_Fahrenheit','Ice_Cream_sales'])  
          
 X = df[['Temprature_in_Fahrenheit']]  
 Y = df['Ice_Cream_sales']  
 # splitting X and y into training and testing sets   
 from sklearn.model_selection import train_test_split   
 X_train, X_test, y_train, y_test = train_test_split(X, Y, 
 test_size=0.2, random_state=1)
          
 # create linear regression object   
 reg = LinearRegression()  
 # train the model using the training sets   
 reg.fit(X_train, y_train)  
  #Prediction  
  y_predict = reg.predict(X_test)  
        
  ## plotting residual errors in training data   
  plt.scatter(reg.predict(X_train), reg.predict(X_train) - 
  y_train, color = "green", s = 10, label = 'Train data')   
  ## plotting residual errors in test data   
  plt.scatter(reg.predict(X_test), reg.predict(X_test) - y_test, 
  color = "blue", s = 10, label = 'Test data')   
  ## plotting line for zero residual error   
  plt.hlines(y = 0, xmin = 0, xmax = 2000, linewidth = 2)   
  ## plotting legend   
  plt.legend(loc = 'upper right')   
  ## plot title   
  plt.title("Residual errors")     
  ## function to show plot   
  plt.show()  
 
      

RABBITMQ

What is RabbitMQ?

RabbitMQ is an open source message broker software. It accepts messages from producers, and delivers them to consumers. It acts like a middleman which can be used to reduce loads and delivery times taken by web application servers

Features of RabbitMQ:

RabbitMQ is an open source message broker software. It accepts messages from producers, and delivers them to consumers. It acts like a middleman which can be used to reduce loads and delivery times taken by web application servers

  • Robust messaging for building applications in a distributed manner.
  • Easy to use
  • Runs on all major Operating Systems.
  • Supports a huge number of developer platforms
  • Supports multiple messaging protocols, message queuing, delivery acknowledgement, flexible routing to queues, multiple exchange type.
  • Open source and commercially supported
How RabbitMQ Works?

The Producer sends messages to an exchange. An exchange is responsible for the routing of the messages to the different queues. An exchange accepts messages from the producer application and routes them to message queues with the help of bindings and routing keys. A binding is a key between a queue and an exchange. Then consumers receive messages from the queue.

Prerequisites:
  • RabbitMQ
  • Python

How to Send and Receive a message using RABBITMQ?

Send a Message using RabbitMQ:

Following Program send.py will send a single message to the queue.

Step 1: To Establish a connection with RabbitMQ server.

 
       
        import pika
        
        connection = pika.BlockingConnection(
            pika.ConnectionParameters(host='localhost'))
        channel = connection.channel()
       
 

Step 2: To Create a hello queue to which the message will be delivered:


channel.queue_declare(queue='hello')
     
   

Step 3: Publish the message and mention the exchange details and queue name in the exchange and routing key params to which queue the message should go.


channel.basic_publish(exchange='', routing_key='hello', body='Hello RabbitMQ!')
        print(" [x] Sent 'Hello RabbitMQ!'")
        

Step 4: closing a connection to make sure the network buffers were flushed and our message was actually delivered to RabbitMQ


connection.close()
       
 
Recieve a message using RabbitMQ:

Following Program recieve.py will send a single message to the queue.

Step 1: It works by subscribing a callbackfunction to a queue. Whenever receiving a message, this callback function is called by the Pika library. Following function will print on the screen the contents of the message.


def callback(ch, method, properties, body):
print(" [x] Received %r" % body)
       
 

Step 2:Next, need to tell RabbitMQ that this particular callback function should receive messages from our hello queue:


channel.basic_consume(
queue='hello', on_message_callback=callback, auto_ack=True)

        

Step 3:And finally, Enter a never-ending loop that waits for data and runs callbacks whenever necessary.


print(' [*] Waiting for messages. To exit press CTRL+C')
channel.start_consuming()
     
   

Step 4:Open terminal. Run the Send.py The producer program will stop after every run: python send.py [x] Sent ‘Hello RabbitMQ!’ We can go to the web browser and hit the URL http://localhost:15672/, and see the count of the message sent as shown below in the dashboard:

Step 5: Open terminal. Run the receive.py program.

python receive.py
[*] Waiting for messages. To exit press CTRL+C
[x] Received ‘Hello RabbitMQ!’

If ready and total count is zero in the dashboard, then confirm the messages are received by consumer.

Note: Continuously send a message through RabbitMQ. As noticed, the receive.py program doesn’t exit. It will stay ready to receive further messages, and may be interrupted with Ctrl-C.

ELK – Elasticsearch, Logstash & Kibana

Introduction

As more and more IT infrastructures move to public clouds such as Amazon Web Services, Microsoft Azure, and Google Cloud, public cloud security tools, and logging platforms are both becoming more and more critical.

The ELK Stack is popular because it fulfills a specific need in the log management and log analysis space. In cloud-based infrastructures, consolidating log outputs to a central location from different sources like web servers, mail servers, database servers, network appliances can be particularly useful. This is especially true when trying to make better data-informed decisions. The ELK stack simplifies searching and analyzing data by providing insights in real-time from the log data.

It is common to run the full ELK stack, not each individual component separately. Each of these services play a important role and in order to perform under high demand it is more advantageous to deploy each service on its own server.

Why ELK?
• Rapid on-premise (or cloud) installation and easy to deploy
• Scales vertically and horizontally
• Easy and various APIs to use
• Ease of writing queries, a lot easier then writing a MapReduce job
• Availability of libraries for most programming/scripting languages
• Elastic offers a host of language clients for Elasticsearch, including Ruby, Python, PHP,
Perl, .NET, Java, and Javascript, and more
• Tools availability
• It’s free (open source), and it’s quick

ELK Stack is used for log collection, indexing and visualization of the collected log data, we can collect any type of logs (windows event logs, http logs , apache server logs etc. ) in the ELK Stack as per configuration.

Logstash:

Logstash is a light-weight, open-source, server-side data processing pipeline that allows you to collect data from a variety of sources, transform it on the fly, and send it to your desired destination. It is most often used as a data pipeline for Elasticsearch, an open-source analytics and search engine. Because of its tight integration with Elasticsearch, powerful log processing capabilities, and over 200 pre-built open-source plugins that can help you easily index your data, Logstash is a popular choice for loading data into Elasticsearch.

Logstash allows you to easily ingest unstructured data from a variety of data sources including system logs, website logs, and application server logs. Logstash offers pre-built filters, so you can readily transform common data types, index them in Elasticsearch, and start querying without having to build custom data transformation pipelines.

The server component of Logstash processes incoming logs. In other words, Logstash collects, parses, and enriches logs before indexing them into Elasticsearch.

It is the pipeline which collects log data and pushes the collected data to the Elasticsearch.

Logstash Input Plugins

• Stdin – Reads events from standard input
• File – Streams events from files (similar to “tail -0F”)
• Syslog – Reads syslog messages as events
• Eventlog – Pulls events from the Windows Event Log
• Imap – read mail from an IMAP server
• Rss – captures the output of command line tools as an event
• Snmptrap – creates events based on SNMP trap messages
• Twitter – Reads events from the Twitter Streaming API
• Irc – reads events from an IRC server
• Exec – Captures the output of a shell command as an event
• Elasticsearch – Reads query results from an Elasticsearch cluster

Logstash Filter Plugins

• grok – parses unstructured event data into fields
• Mutate – performs mutations on fields
• Geoip – adds geographical information about an IP address
• Date – parse dates from fields to use as the Logstash timestamp for
an event
• Cidr – checks IP addresses against a list of network blocks
• Drop – drops all events

Logstash Output Plugins

• Stdout – prints events to the standard output
• Csv – write events to disk in a delimited format
• Email – sends email to a specified address when output is received
• Elasticsearch – stores logs in Elasticsearch
• Exec – runs a command for a matching event
• File – writes events to files on disk
• mongoDB – writes events to MongoDB
• Redmine – creates tickets using the Redmine API

Elasticsearch

Elasticsearch is the Data storage and indexing part of the ELK Stack. It stores data and indexes it.

It is also a Search engine based on Lucene and provides a distributed search engine with an HTTP web interface and schema-free JSON documents.

The distributed nature of Elasticsearch enables it to process large volumes of data in parallel, quickly finding the best matches for your queries. Elasticsearch operations such as reading or writing data usually take less than a second to complete. This lets you use Elasticsearch for near real-time use cases such as application monitoring and anomaly detection.

An index is like a ‘database’ in a relational database. It has a mapping which defines multiple types. We can think of an index as a type of data organization mechanism, allowing the user to partition data a certain way.

Other key concepts of Elasticsearch are replicas and shards, the mechanism Elasticsearch uses to distribute data around the cluster. Elasticsearch implements a clustered architecture that uses sharding to distribute data across multiple nodes, and replication to provide high availability.

The index is a logical namespace which maps to one or more primary shards and can have zero or more replica shards. A shard is a Lucene index and an Elasticsearch index is a collection of shards. The application talks to an index, and Elasticsearch routes the requests to the appropriate shards.

Kibana:

Kibana is the visualization web interface through which we can visualize the indexed log data. Kibana is an open-source data visualization and exploration tool used for log and time-series analytics, application monitoring, and operational intelligence use cases. It offers powerful and easy-to-use features such as histograms, line graphs, pie charts, heat maps, and built-in geospatial support.

How to Import BACPAC File Created from Azure SQL Database?

When you need to export a database for archiving or for moving to another platform, you can export the database schema and data to a BACPAC file. A BACPAC file is a ZIP file with an extension of BACPAC containing the metadata and data from a SQL Server database. A BACPAC file can be stored in Azure Blob storage or in local storage in an on-premises location and later imported back into Azure SQL Database or into a SQL Server on-premises installation.

Import BACPAC File to On-Premise SQL Server :

  • C:\Program Files (x86)\Microsoft SQL Server\140\DAC\bin>
  • SqlPackage.exe /a:import /sf:\\Userdb0.bacpac /tsn:SERVER-SQL\DEV2016 /tdn:Azure_Test /p:CommandTimeout=2400

Error :

When you are try to import BACPAC File created from Azure Environment, you might encounter the following error if it consists of External Data Source Reference.

TITLE: Microsoft SQL Server Management Studio
            
    Could not import package. 
    Warning SQL72012: The object [AzureProd] exists in the target, 
    but it will not be dropped even though you selected the 
    ‘Generate drop statements for objects that are in the target database but that 
    are not in the source’ check box. 
    Warning SQL72012: The object [AzureProd_Log] exists in the target,
    but it will not be dropped even though you selected the 
    ‘Generate drop statements for objects that are in the target database but that 
    are not in the source’ check box. 
    Error SQL72014: .Net SqlClient Data Provider: Msg 102, Level 15, State 1, 
    Line 1 Incorrect syntax near ‘EXTERNAL’. 
    Error SQL72045: Script execution error. The executed script: 
    CREATE EXTERNAL DATA SOURCE [DB_EXT_EDS]
    WITH (
    TYPE = RDBMS,
    LOCATION = N’sqlserver.database.windows.net’,
    DATABASE_NAME = N’AdventureWorks’, 
    CREDENTIAL = [DB_EXT_CRED] ); 
            
    

Solution :

Drop external Tables and External Data Source in Azure SQL Database and create BACPAC File again without those references.

Drop External Tables and External Data Source

            
        IF EXISTS 
        (
        SELECT 'x' FROM sys.external_tables)
        BEGIN
        DROP EXTERNAL TABLE EXT_Table1 
        DROP EXTERNAL TABLE EXT_Table2 
        DROP EXTERNAL TABLE EXT_Table3 
        END   
         
        IF EXISTS 
        ( 
        SELECT * FROM sys.external_data_sources 
        WHERE name ='DB_EXT_EDS' 
        ) 
        BEGIN 
        DROP EXTERNAL DATA SOURCE DB_EXT_EDS; 
        END  
            
        

If you can’t recreate BACPAC without dropping the tables, you can follow these steps.

  1. Change the file extension to zip, then decompress it into a folder. Surprisingly, a bacpac is actually just a zip file, not something proprietary and hard to get into.
  2. Find the model.xml file and edit it to remove the section that looks like this:
<Element Type=”SqlExternalDataSource” Name=”[BoxDataSrc]”>
<Property Name=”DataSourceType” Value=”1′′ />
<Property Name=”Location” Value=”MYAZUREServer.database.windows.net” /> 
<Property Name=”DatabaseName” Value=”MyAzureDb” />
<Relationship Name=”Credential”>
<Entry>
<References Name=”[SQL_Credential]” />
</Entry>
</Relationship>
</Element>

If you have multiple external data sources of this type, you will probably need to repeat step 2 for each one.

Save and close model.xml.

Now you need to re-generate the checksum for model.xml so that the bacpac doesn’t think it was tampered with (since you just tampered with it). Create a PowerShell file named computeHash.ps1 and put this code into it.

Generate Checksum

             
            $modelXmlPath = Read-Host "model.xml file path" 
            $hasher = [System.Security.Cryptography.HashAlgorithm]:
            :Create("System.Security.Cryptography.SHA256Crypt oServiceProvider") 
            $fileStream = new-object System.IO.FileStream ` -ArgumentList
            @($modelXmlPath, [System.IO.FileMode]::Open) 
            $hash = $hasher.ComputeHash($fileStream) 
            $hashString = "" Foreach ($b in $hash) { $hashString += $b.ToString("X2") } 
            $fileStream.Close() $hashString 
            
     

Run the PowerShell script and give it the filepath to your unzipped and edited model.xml file. It will return a checksum value.

Copy the checksum value, then open up Origin.xml and replace the existing checksum, toward the bottom on the line that looks like this:

<Checksum Uri=”/model.xml”>9EA0F06B282G4F42955C78A98822A31AA0ED0225CB131B
8759379055A482D0 1G</Checksum> 

Save and close Origin.xml, then select all the files and put them into a new zip file and rename the extension to bacpac.

Now you can use this new bacpac to import the database without getting the error.

Analytics

Analytics is the discovery, interpretation, and communication of meaningful patterns in data; and the process of applying those patterns towards effective decision making. In other words, analytics can be understood as the connective tissue between data and effective decision making, within an organization. Organizations may apply analytics to business data to describe, predict, and improve business performance.
Big data analytics is the complex process of examining large and varied data sets — or big data — to uncover information including hidden patterns, unknown correlations, market trends and customer preferences that can help organizations make informed business decisions.

Glue, Athena and QuickSight are 3 services under the Analytics Group of services offered by AWS. Glue is used for ETL, Athena for interactive queries and Quicksight for Business Intelligence (BI).

Glue

AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy for customers to prepare and load their data for analytics. We can create and run an ETL job with a few clicks in the AWS Management Console. We simply point AWS Glue to our data stored on AWS, and AWS Glue discovers our data and stores the associated metadata (e.g. table definition and schema) in the AWS Glue Data Catalog. Once cataloged, our data is immediately searchable, queryable, and available for ETL.

In this blog we will look at 2 components of Glue – Crawlers and Jobs

Glue Crawlers

Glue crawlers can scan data in all kinds of repositories, classify it, extract schema information from it, and store the metadata automatically in the AWS Glue Data Catalog. From there it can be used to guide ETL operations.

Suppose we have a file named people.json in S3 with the below contents:

                
        {"name":"Ricky","age":22}
        {"name":"Jeff","age":36}
        {"name":"Geddy","age":62}
                
        

Below are the steps to crawl this data and create a table in AWS Glue to store this data:

  1. On the AWS Glue Console, click “Crawlers” and then “Add Crawler”
  2. Give a name for your crawler and click next
  3. Select S3 as data source and under “Include path” give the location of json file on S3.
  4. Since we are going to crawl data from only 1 dataset, select No in next screen and click Next
  5. In next screen select an IAM role which has access to the S3 data store
  6. Select Frequency as “Run on demand” in next screen.
  7. Select a Database to store the crawler’s output. I chose a database named “saravanan” in the screen below. If no database exists, Add a database using the link given
  8. Review all details in next step and click Finish
  9. On next screen, click on “Run it now” to run the crawler
  10. The crawler runs for around a minute and finally you will be able to see status as Stopping / Ready with Tables added count as 1.
  11. Now you can go to Tables link and see that a table named “people_json” has been created under “Saravanan” database.
  12. Using the “View details” Action, and then scrolling down, you can see the schema for the table which Glue has automatically inferred and generated.
Glue jobs

The AWS Glue Jobs system provides managed infrastructure to orchestrate our ETL workflow. We can create jobs in AWS Glue that automate the scripts we use to extract, transform, and transfer data to different locations. Jobs can be scheduled and chained, or they can be triggered by events such as the arrival of new data.

To add a new job using the console

  1. Open the AWS Glue console, and choose the Jobs tab.
  2. Choose Add job and follow the instructions in the Add job wizard. Below screens copy data from the table we created earlier to a parquet file named people-parquet in same S3 bucket.




    After the above job runs and completes, you will be able to verify in S3 that the output Parquet has been created.
DynamicFrame

Glue Jobs use a data structure named DynamicFrame. A DynamicFrame is similar to a Spark DataFrame, except that each record is self-describing, so no schema is required initially. Instead, AWS Glue computes a schema on-the-fly when required, and explicitly encodes schema inconsistencies using a choice (or union) type.

Instead of just using the python job which Glue generates, we can code our own jobs using DynamicFrames and have

                
        import sys
        from awsglue.transforms import *
        from awsglue.utils import getResolvedOptions
        from pyspark.context import SparkContext
        from awsglue.context import GlueContext
        
        glueContext = GlueContext(SparkContext.getOrCreate())
        
        users = glueContext.create_dynamic_frame.from_catalog(
                database="saravanan",
                table_name="users")
        users_courses = glueContext.create_dynamic_frame.from_catalog(
                database="saravanan",
                table_name="users_courses")
        users = users.select_fields(['AccountName','Id','UserName','FullName','Active'])
        .rename_field('Active','UserActive')
         users_courses = users_courses.select_fields(['UserId', 'Id','Name','Code','Active',
        'Complete','PercentageComplete','Overdue']).rename_field('Id','Course_Id')\
        .rename_field('Name','CourseName').rename_field('Code','CourseCode').rename_field 
         ('Active','CourseActive').rename_field('Complete','CourseComplete')\
         .rename_field('PercentageComplete','CoursePercentageComplete').rename_field
         ('Overdue','CourseOverDue')
        joined_table = Join.apply(users, users_courses, 'Id', 'UserId').drop_fields(['Id'])

        joined_table.toDF().write.parquet('s3://saravanan-glue/parquet_partitioned',
                partitionBy=['AccountName'])
                
        

it run through Glue. The same Glue job on next page selects specific fields from 2 Glue tables, renames some of the fields, joins the tables and writes the joined table to S3 in parquet format.

Athena

Amazon Athena is an interactive query service that makes it easy to analyse data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and we pay only for the queries that we run.

Athena is easy to use. We must simply point to our data in Amazon S3, define the schema, and start querying using standard SQL. Most results are delivered within seconds. With Athena, there’s no need for complex ETL jobs to prepare our data for analysis. This makes it easy for anyone with SQL skills to quickly analyse large-scale datasets.

Athena is out-of-the-box integrated with AWS Glue Data Catalog, allowing us to create a unified metadata repository across various services, crawl data sources to discover schemas and populate your Catalog with new and modified table and partition definitions, and maintain schema versioning.

Since Athena uses same Data Catalog as Glue, we will be able to query and view properties of the people_json table which we created earlier using Glue.

Also, we can create new table using data from S3 bucket data as shown below:


Unlike Glue, we have to explicitly give the data format (CSV, JSON, etc) and specify the column names and types while creating the table in Athena.


We can also manually create and query the tables using SQL as shown below:

QuickSight

Amazon QuickSight is a fast, cloud-powered business intelligence (BI) service that makes it easy for us to deliver insights to everyone in our organization.

QuickSight lets us create and publish interactive dashboards that can be accessed from browsers or mobile devices. We can embed dashboards into our applications, providing our customers with powerful self-service analytics.

QuickSight easily scales to tens of thousands of users without any software to install, servers to deploy, or infrastructure to manage.

Below are the steps to create a sample Analysis in QuickSight:

  1. Any Analysis in QuickSight requires data from a Data Set. First click on the “Manage data” link at top right to list the Data Sets we currently have.
  2. To create a new Data Set, click the “New data set” link
  3. We can create Data Set from any of the Data sources listed here – uploading a file, S3, Athena table, etc.
  4. For our example, I am selecting Athena as data source and giving it a name “Athena source”. Then we must map this to a database / table in Athena.
  5. After we select the Athena table, QuickSight provides us an option to import the data to SPICE. SPICE is Amazon QuickSight’s in-memory optimized calculation engine, designed specifically for fast, adhoc data visualization. SPICE stores our data in a system architected for high availability, where it is saved until we choose to delete it.
  6. Using the Edit/Preview Data option above allows us to select the columns to be included in Data set and rename them if required.
  7. Once we click the “Save & visualize” link above, QuickSight starts creating an Analysis for us. For our exercise we will select the Table visual type from the list.
  8. Add Account Name and User_id by dragging them from “Fields list” to “Group by” and course_active to “Value”
  9. Now we will add 2 parameters for Account Name and Learner id by clicking on Parameters at bottom left. While creating the parameter use the option “Link to a data set field” for Values and link the parameter to the appropriate column in the Athena table
  10. Once the parameters are added, create controls for the parameters. If we are adding 2 parameters with controls, we have option of showing only relevant values for second parameter based on the values selected for first parameter. For this select the “Show relevant values only” checkbox.
  11. Next add 2 Custom filters for Account Name and Learner id. These filters should me mapped to the parameters we had created earlier. For this choose the Filter type as “Custom filter” and select the “Use parameters” checkbox.
  12. Now using the Visualize option, we can verify if our Controls are working correctly
  13. To share the Dashboard with others, use the share option on top towards the right and use publish dashboard. We can search for users of the AWS account by email and publish to selective users.
Messaging system

A Messaging System is responsible for transferring data from one application to another, so the applications can focus on data, but not worry about how to share it. In Big Data, an enormous volume of data is used.

Two types of messaging patterns are available:

Messaging system

In a point-to-point system, messages are persisted in a queue. One or more consumers can consume the messages in the queue, but a particular message can be consumed by a maximum of one consumer only.

Publish-Subscribe Messaging System

In the publish-subscribe system, messages are persisted in a topic. Unlike point-to-point system, consumers can subscribe to one or more topic and consume all the messages in that topic. In the Publish-Subscribe system, message producers are called publishers and message consumers are called subscribers.

Kafka

Apache Kafka is a distributed publish-subscribe messaging system and a robust queue that can handle a high volume of data and enables us to pass messages from one end-point to another.

The following diagram illustrates the main terminologies.

Topics

A stream of messages belonging to a category is called a topic. Data is stored in topics. Kafka topics are analogous to radio / TV channels. Multiple consumers can subscribe to same topic and consume the messages.

Topics are split into partitions. For each topic, Kafka keeps a minimum of one partition. Each such partition contains messages in an immutable ordered sequence.

Partition offset

Each partitioned message has a unique sequence id called as offset. For each topic, the Kafka cluster maintains a partitioned log that looks like this:

Each partition is an ordered, immutable sequence of records that is continually appended to—a structured commit log. The records in the partitions are each assigned a sequential id number called the offset that uniquely identifies each record within the partition.

The Kafka cluster durably persists all published records—whether or not they have been consumed—using a configurable retention period. For example, if the retention policy is set to two days, then for the two days after a record is published, it is available for consumption, after which it will be discarded to free up space.

Replicas of partition

Replicas are nothing but backups of a partition. Replicas are never read or write data. They are used to prevent data loss.

Brokers

Brokers are simple system responsible for maintaining the published data. Each broker may have zero or more partitions per topic.

Kafka Cluster

Kafka’s having more than one broker are called as Kafka cluster.

Kafka Cluster Architecture

Zookeeper

ZooKeeper is used for managing and coordinating Kafka broker. ZooKeeper service is mainly used to notify producer and consumer about the presence of any new broker in the Kafka system or failure of the broker in the Kafka system.

Consumer Group

Consumers label themselves with a consumer group name, and each record published to a topic is delivered to one consumer instance within each subscribing consumer group.

Kafka features
  1. High Throughput: Provides support for hundreds of thousands of messages with modest hardware.
  2. Scalability: Highly scalable distributed system with no downtime
  3. Data Loss: Kafka ensures no data loss once configured properly
  4. Stream processing: Kafka can be used along with real time streaming applications like Spark and Storm
  5. Durability: Provides support to persisting messages on disk
  6. Replication: Messages can be replicated across clusters, which supports multiple subscribers
Installing and Getting started

Prerequisite: Install Java

  1. Download kafka .tgz file from https://kafka.apache.org/downloads
  2. Untar the file and go into the kafka directory
              
             > tar -xzf kafka_2.11-2.1.0.tgz
             > cd kafka_2.11-2.1.0 
             
  3. Start the zookeeper server using the properties in zookeeper.properties
              
            > bin/zookeeper-server-start.sh config/zookeeper.properties
              
            
  4. iv. Start the Kafka broker using the properties in server.properties
              
            > bin/kafka-server-start.sh config/server.properties 
             
            
Create a topic

Let’s create a topic named “test” with a single partition and only one replica:
In below command 2181 is the port number we have specified in zookeeper.properties

            
        > bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
            
        

To list the current list of topics, we can query the zookeeper using below command:

                
                    > bin/kafka-topics.sh --list --zookeeper localhost:2181
        Test
                
        
Command line producer

Kafka comes with a command line client that will take input from a file or from standard input and send it out as messages to the Kafka cluster. By default, each line will be sent as a separate message. Run the producer and then type a few messages into the console to send to the server

(In below command 9092 is the port number we configured for the broker in server.properties)

                
                    > bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
        This is a message
        This is another message
                
        
Command line consumer

Kafka also has a command line consumer that will dump out messages to standard output.

                
                    > bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning
        This is a message
        This is another message
                
        
Kafka producer and consumer using python

The Kafka producer and consumer can be coded in many languages like java, python, etc. In this section, we will see how to send and receive messages from a python topic using python.

  1. First we have to install the kafka-python package using python package manager.
                            
                                pip install kafka-python
                            
                            
  2. Below python program reads records from an input file and sends them as messages to the test topic which we created in previous section

    Producer.py

                    
                        from kafka import KafkaProducer
            producer = KafkaProducer(bootstrap_servers=['localhost:9092'])
            with open('/example/data/inputfile.txt') as f:
                 lines = f.readlines()
            
            for line in lines:
               producer.send('test', line)
                    
            

Below python program consumes the messages from the kafka topic and prints them on the screen.

Consumer.py

                
                    from kafka import KafkaConsumer
        consumer = KafkaConsumer(bootstrap_servers=['localhost:9092'], auto_offset_reset='earliest')
        consumer.subscribe(['test'])
        for msg in consumer:
            print(msg)
                
        

In the above program, we have

                
                    auto_offset_reset=’earliest’.
                
                

This will cause the consumer program to read all the messages from beginning if the same program is run again and again.

Instead if we have

                 auto_offset_reset=’latest’,
                
                

the consumer program will read messages starting from the latest offset which was consumed earlier.

Helm is a package manager for Kubernetes that allows developers and operators to more easily package, configure, and deploy applications and services onto Kubernetes clusters.

What Does Kubernetes Helm Solve?

Kubernetes is known as a complex platform to understand and use. Kubernetes Helm helps make Kubernetes easier and faster to use:

Increased productivity – developers can deploy a pre-tested app via a Helm chart and focus on developing their applications, instead of spending time on deploying test environments to test their Kubernetes clusters

Existing Helm Charts – allow developers to get a working database, big data platform, CMS, etc. deployed for their application with one click. Developers can modify existing charts or create their own to automate dev, test or production processes.

Easier to start with Kubernetes – it can be difficult to get started with Kubernetes and learn how to deploy production-grade applications. Helm provides one click deployment of apps, making it much easier to get started and deploy your first app, even if you don’t have extensive container experience.

Decreased complexity – deployment of Kubernetes-orchestrated apps can be extremely complex. Using incorrect values in configuration files or failing to roll out apps correctly from YAML templates can break deployments. Helm Charts allow the community to preconfigure applications, defining values that are fixed and others that are configurable with sensible defaults, providing a consistent interface for changing configuration. This dramatically reduces complexity, and eliminates deployment errors by locking out incorrect configurations.

Production ready – running Kubernetes in production with all its components (pods, namespaces, deployments, etc.) is difficult and prone to error. With a tested, stable Helm chart, users can deploy to production with confidence, and reduce the complexity of maintaining a Kubernetes App Catalog.

No duplication of effort – once a developer has created a chart, tested and stabilized it once, it can be reused across multiple groups in an organization and outside it. Previously, it was much more difficult (but not impossible) to share Kubernetes applications and replicate them between environments.

Helm provides this functionality through the following components:

  • A command line tool, helm, which provides the user interface to all Helm functionality.
  • A companion server component, tiller, that runs on your Kubernetes cluster, listens for commands from helm, and handles the configuration and deployment of software releases on the cluster.
  • The Helm packaging format, called charts.
  • An official curated charts repository with prepackaged charts for popular open-source software projects.
Installing Helm

There are two parts to Helm: The Helm client (helm) and the Helm server (Tiller).

INSTALLING THE HELM CLIENT

The Helm client can be installed either from source, or from pre-built binary releases.

From the Binary Releases

Every release of Helm provides binary releases for a variety of OSes. These binary versions can be manually downloaded and installed.

Download your desired version

Unpack it (tar -zxvf helm-v2.0.0-linux-amd64.tgz)

Find the helm binary in the unpacked directory, and move it to its desired destination (mv linux-amd64/helm /usr/local/bin/helm)

INSTALLING TILLER

Tiller, the server portion of Helm, typically runs inside of your Kubernetes cluster. But for development, it can also be run locally, and configured to talk to a remote Kubernetes cluster.

The easiest way to install tiller into the cluster is simply to run helm init. This will validate that helm’s local environment is set up correctly (and set it up if necessary). Then it will connect to whatever cluster kubectl connects to by default (kubectl config view). Once it connects, it will install tiller into the kube-system namespace.

After helm init, you should be able to run kubectl get pods –namespace kube-system and see Tiller running.

USING HELM

A Chart is a Helm package. It contains all of the resource definitions necessary to run an application, tool, or service inside of a Kubernetes cluster.

A Repository is the place where charts can be collected and shared.

A Release is an instance of a chart running in a Kubernetes cluster. One chart can often be installed many times into the same cluster. And each time it is installed, a new release is created.

The “helm install” command can install from several sources:

  • A chart repository
  • A local chart archive (helm install foo-0.1.1.tgz)
  • An unpacked chart directory (helm install path/to/foo)
  • A full URL (helm install https://example.com/charts/foo-1.2.3.tgz)
Charts

Helm uses a packaging format called charts. A chart is a collection of files that describe a related set of Kubernetes resources.

THE CHART FILE STRUCTURE

A chart is organized as a collection of files inside of a directory. The directory name is the name of the chart (without versioning information). Thus, a chart describing WordPress would be stored in the wordpress/ directory.

Inside of this directory, Helm will expect a structure that matches this:

 
     wordpress/
     Chart.yaml          # A YAML file containing information about the chart
     LICENSE             # OPTIONAL: A plain text file containing the license for the chart
     README.md           # OPTIONAL: A human-readable README file
     requirements.yaml   # OPTIONAL: A YAML file listing dependencies for the chart
     values.yaml         # The default configuration values for this chart
     charts/             # A directory containing any charts upon which this chart depends.
     templates/          # A directory of templates that, when combined with values,
                         # will generate valid Kubernetes manifest files.
     templates/NOTES.txt # OPTIONAL: A plain text file containing short usage notes
 
EXAMPLE

Lets Build and publish a simple http service and say “Hello world”.

Package and publish via Helm.

 
     Docker: Build and publish “Hello World”
        Dockerfile 

Hello world!

 
     rawapp-index.html hosted withby GitHub
        
        FROM busybox
        ADD app/index.html /www/index.html
        EXPOSE 8005
        CMD httpd -p 8005 -h /www; tail -f /dev/null
        Dockerfile hosted with by GitHub
        
        docker build -t hello-world .
        docker run -p 80:8005 hello-world
        ## open your browser and check http://localhost/
        docker login
        docker tag hello-world {your_dockerhub_user}/hello-world
        docker push {your_dockerhub_user}/hello-world:latest
 

Helm: build and install

We need helm chart files, just do:

 
     helm create helloworld-chart
        
        image:
          repository: {your_dockerhub_user}/hello-world
          tag: latest
          pullPolicy: IfNotPresent
        service:
          name: hello-world
          type: LoadBalancer
          externalPort: 80
          internalPort: 8005
 

Now, we need to package this helm chart

 
     helm package helloworld-chart --debug
        ## helloworld-chart-0.1.0.tgz file was created
        helm install helloworld-chart-0.1.0.tgz --name helloworld
        kubectl get svc --watch # wait for a IP
 
CHART REPOSITORIES

A chart repository is an HTTP server that houses one or more packaged charts. Any HTTP server that can serve YAML files and tar files and can answer GET requests can be used as a repository server.

Deadlocks in Azure SQL Database

Recently we were working with Azure Logic Apps to invoke Azure Functions.
By Default, Logic App runs parallel threads and we didn’t explicitly control the concurrency and left the default values.

So Logic App invoked several concurrent threads which in turn invoked several Azure Functions.
The problem was Azure Functions invoked Database Calls which caused Deadlocks. In Ideal world, Database should be able to handle numerous concurrent functions without deadlocks. Our process high percentage of shared data and we wanted to ensure the consistency , so we had Explicit Transactions in our Stored procedure calls. That’s the root cause of the problem and we didn’t want to remove the explicit Transaction.

The solution we implemented to alleviate this problem is to run this process in Sequence instead of parallel threads.

Log App Concurrency Control Behavior

For each loops execute in parallel by default. Customize the degree of parallelism, or set it to 1 to execute in sequence.

Logic_App_Concurrency

Deadlock Exception

Transaction (Process ID 166) was deadlocked on lock
| communication buffer resources with another process and has been chosen as the deadlock victim. Rerun the transaction.

Deadlock_Exception

Troubleshooting Deadlocks

So we have identified Deadlock happened in the database through our Application Insights. Next logical question is , what caused this deadlock.

Azure SQL Server Deadlock Count

These queries identifies the deadlock event time as well as the deadlock event details.

                
        SELECT * FROM sys.event_log   
        WHERE event_type = 'deadlock';
        WITH CTE AS (  
        SELECT CAST(event_data AS XML)  AS [target_data_XML]   
        FROM sys.fn_xe_telemetry_blob_target_read_file('dl', 
        null, null, null)  
        )  
        SELECT target_data_XML.value('(/event/@timestamp)[1]', 
        'DateTime2') AS Timestamp,  
        target_data_XML.query('/event/data[@name=''xml_report'']
        /value/deadlock') AS deadlock_xml,  
        target_data_XML.query('/event/data[@name=''database_name'']
        /value').value('(/value)[1]', 'nvarchar(100)') AS db_name  
        FROM CTE
                
        

You can save the Deadlock xml as xdl to view the Deadlock Diagram. This provides all the information we need to identify the root cause of the deadlock and take necessary steps to resolve the issue.

References

Grafana is an open-source, general purpose dashboard and graph composer, which runs as a web application.

You can monitor Azure services and applications from Grafana using the Azure Monitor data source plugin. The plugin gathers application performance data collected by the Application Insights SDK as well as infrastructure data provided by Azure Monitor. You can then display this data on your Grafana dashboard.

Grafana uses an Azure Active Directory service principal to connect to Azure Monitor APIs and collect metrics data. You must create a service principal to manage access to your Azure resources.

Why Grafana?

Grafana provides more visualization options than the Azure Portal. It also supports multiple data sources. One can combine data from multiple sources in a single dashboard. Grafana is designed for analyzing and visualizing metrics such as system CPU, memory, disk and I/O utilization. Users can create comprehensive charts with smart axis formats (such as lines and points) as a result of Grafana’s fast, client-side rendering — even over long ranges of time.

Grafana dashboards are what made Grafana such a popular visualization tool. Visualizations in Grafana are called panels, and users can create a dashboard containing panels for different data sources. Grafana supports graph, singlestat, table, heatmap and freetext panel types. Grafana users can make use of a large ecosystem of ready-made dashboards for different data types and sources. Grafana has no time series storage support. Grafana is only a visualization solution. Time series storage is not part of its core functionality.

Some of the features of Grafana are as follows

  • Optimized for Time series
  • Can pull data from Azure Metrics, Log Analytics and Application Insights
  • Azure Data Explorer (formerly known as Kusto) plugin also released.
  • Rich ecosystem of plugins for data sources and dashboards.
  • Open Source, easy to onboard using Docker, Azure App Service etc.
Azure-app-service

Some of the requirements of Grafana are described below.

  • Azure SPN (Service Principal Name) with reader access to subscription
  • Deploy in Azure web apps.
  • Data source plugin “grafana-azure-monitor-datasource”
  • Supports AD integration via LDAP
  • Easy to export/import and templatize
  • Very DevOps friendly
  • Huge collection of panels https://play.grafana.org
Grafana

Azure Monitor Data Source For Grafana

Azure Monitor is the platform service that provides a single source for monitoring Azure resources. The Azure Monitor Data Source plugin supports Azure Monitor, Azure Log Analytics and Application Insights metrics in Grafana.

Features

  • Support for all the Azure Monitor metrics
    • includes support for the latest API version that allows multi-dimensional filtering for the Storage and SQL metrics.
    • Automatic time grain mode which will group the metrics by the most appropriate time grain value
  • Application Insights metrics
    • Write raw log analytics queries, and select x-axis, y-axis, and grouped values manually.
    • Automatic time grain support
  • Support for Log Analytics (both for Azure Monitor and Application Insights)
  • You can combine metrics from both services in the same graph.
Grafana-graph

Azure Monitor for VMs provides an in-depth view of VM health, performance trends, and dependencies. Azure Monitor for VMs includes a set of performance charts that target several key performance indicators (KPIs) to help you determine how well a virtual machine is performing. Azure Monitor for VMs is focused on the operating system as manifested through the processor, memory, network adapters, and disks.

Azure Dashboards

Azure dashboards allow you to combine different kinds of data, including both metrics and logs, into a single pane in the Azure portal. You can optionally share the dashboard with other Azure users. Elements throughout Azure Monitor can be added to an Azure dashboard in addition to the output of any log query or metrics chart. Azure Monitor is single source for monitoring azure resources. Its Azure’s time series database for all azure metrics.

Some of the important aspects of Azure Dashboard

  • No setup required, already available within Azure Portal.
  • Zoom in zoom out for metrics not available
  • All data from Azure resources.
  • Log Analytics/AI queries cannot be parameterized based on Dashboard selection.
  • Query results can be pinned to dashboards
  • Good panels are not tied to products and can’t be customized.
    • Eg. percentile panels is only available in “Container Insights” and VM insights.
    • The panel cannot be used against “Log Analytics” source or Metric source.

Some of the features of Azure Dashboard are as follows

  • Supports visualizing most Azure resources
  • OOB Integrated with Azure RBAC
  • Supports Log Analytics, App Insights and Metrics
  • No Auto refresh per panel
  • No Zoom in Zoom out.
  • Dashboard queries don’t support variables

Azure Dashboards (VM insights/ Container Insights)

  • These tiles can only be accessed by navigating to the VM resource.
  • They cannot be pinned as is, but the detailed version of this can be pinned.
  • No zoom in zoom out capability.

Azure Dashboards – Metrics

  • These are pinnable
  • Don’t support percentiles
  • No drill ability
  • Each Panel is hard coded to a specific data source even if they might be the same behind the scenes.

Comparison between Grafana and Azure Dashboard is shown below.

Azure Dashboard
  • Multiple Azure resource types
  • Limited configuration options. Requires JSON editing
  • Application Insights à OOB Azure Dashboard
  • Only static queries
  • No setup required
  • Not intuitive for overlaying.
Grafana
  • Mostly Time Series
  • Highly configurable
  • Global variables as filters
  • Dashbord and individual panel refresh.
  • Supports query macros
  • Setup required (minimal)
  • Intuitive overlays

Introduction to Alexa

Amazon’s Alexa is the voice activated, interactive  AI Bot, or intelligent personal assistant in the cloud that lets people speak with their Amazon Echo, Echo dot and other Amazon smart home devices. Alexa is designed to respond to number of commands and converse with people.

Alexa Skills are apps that give Alexa even more abilities. These skills can let her speak to more devices or websites.When the Alexa device is connected through wi-fi or Bluetooth to the internet, it wakes up by merely saying “ALEXA”.  Alexa Skills radically expands the bots repertoire, allowing users to perform more actions with voice-activated control through Alexa.

Overview of Alexa Skill

The most important part of ALEXA skill is its interaction design. Alexa skills don’t have visual feedback  like web or desk top applications and will guide the user through the skill using voice. All Alexa skill replies needs to tell the user clearly what the next options are.

Functional Architechture

An Alexa skill is a small application that interacts with Alexa via an AWS Lambda function

Functional Architechture

Designing the Alexa Skill

The most important aspect of the skill is its Vocal interface. The skill should be interacting naturally with the user. The components of Alexa Skill are :

  • Alexa requires a word, often called as Wake phrase which would alert the device that they can expect a command immediately after.The Default wake phrase is ALEXA. It can also be Amazon, Echo,Computer.
  • Launch Phrase is the word that tells Alexa to trigger a skill. Examples of Launch phrase are “ OPEN”, “ASK” , “START”, “LAUNCH”.
  • Invocation name is the name of the skill.
  • Intents are the goals that the user is trying to achieve by invoking the skill.
  • Utterance tells alexa what the skill should do. Apart from Static utterances such as Start and Launch, dynamic commands can be added. These Dynamic commands are called slots.
  • Each intent can contain one or more slots. A Slot is the variable that is parsed and exposed to the application code.

Sampleutterances

Alexa has a built in natural language processing engine. To Map the verbal phrase to an intent, Alexa handles the complexity of natural language processing through the help of a manually curated file Sampleutterances.txt.

The first word in each line of SampleUtteranaces.txt is the intent name. The application code reads the value of the intent name and responds appropriately. Following the intent name, is the phrase that the user says to achieve that intent. The User might tell a phrase apart from those defined as Slots in the intent. The application is free to react differently based on the presence or value of the slot. To give Alexa the best chance of understanding users, it is recommended to include as many sample utterances as possible. Depending on the skill there could be n number of ever changing sample utterances.

The below example sums up the entire vocal interface

entire vocal interface

Build and Publish a new skill

Building and publishing a new skill in Alexa comprises of  the below steps:

1. Create and Configure Skill :

Create an Alexa skill using  https://developer.amazon.com/alexa-skills-kit. This will open the skill information where we can specify the name of the skill and the invocation name.

2. Create Interaction Model:

Interaction model is a set of rules that defines the way the user interacts with your skill. As a part of an interaction model, Intents, Utterances  are defined. The intent schema should be in JSON format and it should define an array of intents, each with a name, and an optional list of dynamic parts — slots. Alexa will automatically train itself with the provided interaction model.

3. Coding the Backend system:

Once the interaction model has been designed Code and deployment of the Lambda function has to be done.

For each intent, an input/output contract has to be implemented. The input is an IntentRequest which is a representation of the user’s request and includes all the slot values.

The response from alexa can be of  multiple ways.

  • Ask the user a question and wait for response.
  • Give the details to the user and shut down.
  • Say nothing and shut down.

Alexa can either respond verbally or the response could be displayed on the phone.

4. Deploying the Backend system:

The skills can be deployed as an AWS Lambda function with code written in Java or Node.Js, Python or C#. The simplest approach would be the code in Node.Js.

5. Testing the Skill:

Testing of the Skill can either be done through the test simulator available in the Developer console account or through the device connected to the development account.

6. Publishing the Skill:

To Publish the skill, The skill has to be  submitted by filling out the “Publishing Information” and the “Privacy & Compliance” sections

Spring Initializer – using Java and Maven

Spring Initializr is a web tool which is provided by Spring on official site https://start.spring.io/ We can create Spring Boot project by providing project details.

In the below example, we added the springboot-starter-web dependency to write REST Endpoints.

spring_initializer_img

After providing the Group, Artifact, Dependencies, Build Project, Platform and Version, click Generate Project button. The zip file will get downloaded and the files will be extracted. After the project is downloaded, unzip the file.

The maven file pom.xml will have the Web dependency we had selected above.

Web_dependency_img

Note that only the Spring boot starter parent has a version number. Spring boot starter web doesn’t have a version as it is automatically configured based on version of the parent.

You can find the main class file under src/java/main directories with the default package.

directories_img

To write a simple Hello World Rest Endpoint in the Spring Boot Application main class file itself, follow the steps shown below:

  • Firstly, add the @RestController annotation at the top of the class.
  • Now, write a Request URI method with @RequestMapping annotation.
  • Then, the Request URI method should return the Hello World string.
application_main_img

Create an executable JAR by executing the below Maven command in the folder having pom.xml
C:\Users\SaravananP\Downloads\demo\mvn clean install

install_img

The .jar file will be created in the target folder as indicated above

Run the Jar file using java –jar and verify the results

verify_img

result_img

Application Properties

In the above examples, we have seen that Spring boot automatically configured Tomcat to run in port 8080. We can override this by specifying the port in the file src\main\resources\application.properties

port_img

If we rebuild the jar and execute it, we will get an error in http://localhost:8080 and be able to see the Hello World message in http://localhost:9090

404_img

result_img1

Install and Run SQL Server Docker Container on Mac

Like most people, I use Mac , Windows as well Linux OS for development and testing purposes. Primarily I use Mac for Development purpose. I have few projects which uses SQL Server as Data Storage Layer. Setting up Docker Container on Mac and Opening up the ports was pretty easy and doesn’t take more than 10 Minutes.

Steps followed :
  • Install Docker
  • Pull SQL Server Docker Image
  • Run SQL Server Docker Image
  • Install mssql Client
  • Install Kitematic
  • Open the Ports to connect to SQL Server from the network
  • Setup port forwarding to enable access outside the network
Install Docker :

Get Docker dmg image and install. Just follow the prompts and its very straight forward. 
https://docs.docker.com/docker-for-mac/install/#download-docker-for-mac https://download.docker.com/mac/stable/Docker.dmg

Once you have installed docker , you can verify the installation and version.

                bash-3.2$ docker -v
        Docker version 17.09.0-ce, build afdb6d4 
Pull SQL Server Docker Image ( DEV Version )
                docker pull microsoft/mssql-server-linux:2017-latest 
Create SQL Server Container from the Image and Expose it on port 1433 ( Default Port )
                docker run -d --name macsqlserver -e 'ACCEPT_EULA=Y' -e 'SA_PASSWORD=Passw1rd' -e 'MSSQL_PID=Developer' -p 1433:1433 microsoft/mssql-server-linux:2017-latest 

-d: this launches the container in daemon mode, so it runs in the background

–name name_your_container (macsqlserver): give your Docker container a friendly name, which is useful for stopping and starting containers from the Terminal.

-e ‘ACCEPT_EULA=Y: this sets an environment variable in the container named ACCEPT_EULAto the value Y. This is required to run SQL Server for Linux.

-e ‘SA_PASSWORD=Passw1rd’: this sets an environment variable for the sa database password. Set this to your own strong password. Also required.

-e ‘MSSQL_PID=Developer’: this sets an environment variable to instruct SQL Server to run as the Developer Edition.

-p 1433:1433: this maps the local port 1433 to the container’s port 1433. SQL Server, by default, listens for connections on TCP port 1433.

microsoft/mssql-server-linux: this final parameter tells Docker which image to use

Install SQL Client for MAC

If you don’t have npm installed in Mac, install homebrew and node.

                ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
        brew install node
        node -v
        npm -v 
                $ npm install -g sql-cli
         
        /usr/local/bin/mssql -> /usr/local/lib/node_modules/sql-cli/bin/mssql
        /usr/local/lib
        └── [email protected]
         
        $ npm i -g npm 
Connect to SQL Server Instance
                $ mssql -u sa -p Passw1rd
        Connecting to localhost...done
         
        sql-cli version 0.6.2
        Enter ".help" for usage hints.
        mssql> select * from sys.dm_exec_connections 
Get External Tools to Manage Docker

Kitematic

https://kitematic.com/

Open Up the Firewall to connect to SQL Server from outside the Host

Ensure your firewall is configured to allow the connections to the SQL Server. I turned of “Block all incoming connections” and enabled “Automatically allow downloaded signed software to receive incoming connections”. Without proper firewall configurations, you won’t be able to connect to the SQL Server outside the host.

Ensure Firewall allows the incoming connections to the Docker
Connecting from the Internet ( Port forwarding Setup )

Lets say you want to connect to the SQL Server you setup from outside the network or from anywhere via internet,you can setup port forwarding.

Get your public facing IP and setup the port forwarding for Port 1433 ( SQL Server port you have setup your docker container ). If its setup correctly , you should be able to telnet into that port to verify the connectivity.

        telnet 69.11.122.159 1433 

 Unless you absolutely require it , its very bad idea to expose the SQL Server to internet. It should be behind the network , only your webserver should be accessible via internet.

Troubleshooting :

While launching docker container , if you get the error saying that it doesn’t have enough memory to launch SQL Server Container, go ahead and modify the memory allocation for docker container.

  • This image requires Docker Engine 1.8+ in any of their supported platforms.
  • At least 3.25 GB of RAM. Make sure to assign enough memory to the Docker VM if you’re running on Docker for Mac or Windows.

I have setup this way.

Docker Memory configs

If you don’t provision enough memory, you will error like this.

Docker SQL Server Memory Error
Look into Docker logs

Following command ( docker ps -a and docker logs mcsqlserver ) shows list of running processes and docker logs.

        $ docker ps -a
CONTAINER ID        IMAGE                                      COMMAND                  CREATED             STATUS              PORTS                    NAMES
9ea3a24563f9        microsoft/mssql-server-linux:2017-latest   "/bin/sh -c /opt/m..."   About an hour ago   Up About an hour    0.0.0.0:1433->1433/tcp   macsqlserver
$ docker logs macsqlserver
2017-10-08 23:06:52.29 Server      Setup step is copying system data file 
'C:\templatedata\master.mdf' to '/var/opt/mssql/data/master.mdf'.
2017-10-08 23:06:52.36 Server      Setup step is copying system data file 
'C:\templatedata\mastlog.ldf' to '/var/opt/mssql/data/mastlog.ldf'.
2017-10-08 23:06:52.36 Server      Setup step is copying system data file 
'C:\templatedata\model.mdf' to '/var/opt/mssql/data/model.mdf'.
2017-10-08 23:06:52.38 Server      Setup step is copying system data file 
'C:\templatedata\modellog.ldf' to '/var/opt/mssql/data/modellog.ldf'.
 
Security:

I highly recommend to create least privileged accounts and disable SA login. If you are exposing your SQL Server to internet, there are ton of hacking and pentest tools which uses sa login for brute force attack.

Bulk Load Data Files in S3 Bucket into Aurora RDS

We typically get data feeds from our clients ( usually about ~ 5 – 20 GB) worth of data. We download these data files to our lab environment and use shell scripts to load the data into AURORA RDS . We wanted to avoid unnecessary data transfers and decided to setup data pipe line to automate the process and use S3 Buckets for file uploads from the clients.

In theory it’s very simple process of setting up data pipeline to load data from S3 Bucket into Aurora Instance .Even though it’s trivial , setting up this process is very convoluted multi step process . It’s not as simple as it sounds . Welcome to Managed services world.

STEPS INVOLVED :
  • Create ROLE and Attach S3 Bucket Policy :
  • Create Cluster Parameter Group
  • Modify Custom Parameter Groups to use ROLE
  • REBOOT AURORA INSTANCE
GRANT AURORA INSTANCE ACCESS TO S3 BUCKET

By default aurora cannot access S3 Buckets and we all know it’s just common sense default setup to reduce the surface area for better security.

For EC2 Machines you can attach a role and the EC2 machines can access other AWS services on behalf of role assigned to the Instance.Same method is applicable for AURORA RDS. You Can associate a role to AURORA RDS which has required permissions to S3 Bucket .

There are ton of documentation on how to create a role and attach policies . It’s pretty widely adopted best practice in AWS world. Based on AWS Documentation, AWS Rotates access keys attached to these roles automatically. From security aspect , its lot better than using hard coded Access Keys.

In Traditional Datacenter world , you would typically run few configuration commands to change configuration options .( Think of sp_configure in SQL Server ).

In AWS RDS World , its tricky . By default configurations gets attached to your AURORA Cluster . If you need to override any default configuration , you have to create your own DB Cluster Parameter Group and modify your RDS instance to use the custom DB Cluster Parameter Group you created.Now you can edit your configuration values .

The way you attach a ROLE to AURORA RDS is through Cluster parameter group .

These three configuration options are related to interaction with S3 Buckets.

  • aws_default_s3_role
  • aurora_load_from_s3_role
  • aurora_select_into_s3_role

Get the ARN for your Role and modify above configuration values from default empty string to ROLE ARN value.

Then you need to modify your Aurora instance and select to use the role . It should show up in the drop down menu in the modify role tab.

GRANT AURORA LOGIN LOAD FILE PERMISSION
 
        
        GRANT LOAD FROM S3 ON *.* TO user@domain-or-ip-address
        GRANT LOAD FROM S3 ON *.* TO 'aurora-load-svc'@'%' 
REBOOT AURORA INSTANCE

Without Reboot you will be spending lot of time troubleshooting. You need to reboot to the AURORA Instance for new cluster parameter values to take effect.

After this you will be be able to execute the LOAD FILE FROM S3 to AURORA .

Screen Shots :
Create ROLE and Attach Policy :


Attach S3 Bucket Policy :

Create Parameter Group :

Modify Custom Parameter Groups

Modify AURORA RDS Instance to use ROLE

Troubleshooting :
Errors :

Error Code: 1871. S3 API returned error: Missing Credentials: Cannot instantiate S3 Client 0.078 sec

Usually means , AURORA Instance can’t reach S3 Bucket. Make sure you have applied the role and rebooted the Instance.

Sample BULK LOAD Command :

You could use following sample scripts to test your Setup.

 
        
        LOAD DATA FROM S3 's3://yourbucket/allusers_pipe.txt'
        INTO TABLE ETLStage.users
        FIELDS TERMINATED BY '|'
        LINES TERMINATED BY '\n'
        (@var1, @var2, @var3, @var4, @var5, @var6, @var7, @var8, @var9, @var10, @var11, @var12, @var13, @var14, @var15, @var16, @var17, @var18)
        SET
        userid = @var1,
        username = @var2,
        firstname = @var3,
        lastname = @var4,
        city=@var5,
        state=@var6,
        email=@var7,
        phone=@var8,
        likesports=@var9,
        liketheatre=@var10,
        likeconcerts=@var11,
        likejazz=@var12,
        likeclassical=@var13,
        likeopera=@var14,
        likerock=@var15,
        likevegas=@var16,
        likebroadway=@var17,
        likemusicals=@var18 

Sample File in S3 Public Bucket : s3://awssampledbuswest2/tickit/allusers_pipe.txt

 
        
        SELECT * FROM ETLStage.users INTO OUTFILE S3's3-us-west-2://s3samplebucketname/outputestdata'
        FIELDS TERMINATED BY ','
        LINES TERMINATED BY '\n'
        MANIFEST ON
        OVERWRITE ON; 
 
        
        create table users_01(
        userid integer not null primary key,
        username char(8),
        firstname varchar(30),
        lastname varchar(30),
        city varchar(30),
        state char(2),
        email varchar(100),
        phone char(14),
        likesports varchar(100),
        liketheatre varchar(100),
        likeconcerts varchar(100),
        likejazz varchar(100),
        likeclassical varchar(100),
        likeopera varchar(100),
        likerock varchar(100),
        likevegas varchar(100),
        likebroadway varchar(100),
        likemusicals varchar(100)) 

This is a continuation of the blog post that covers how to setup and run Image2Docker on local machines.

Local Machines
  • This mode looks for the IIS installed on the local machine and converts the IIS sites /virtual directories/ applications to docker files associate artifacts.
  • Run the following command
     
                    
                     ConvertTo-Dockerfile `
                     -Local `
                     -OutputPath {{OutputPath}} `
                     -Artifact IIS  `	
                     -Verbose 
  • Local parameter is used for iis discovery on local machines.
  • OutputPath parameter specifies the location to store the generated Dockerfile and associated artifacts.
  • Artifact parameter specifies what artifact to inspect. In our case this is IIS.
  • Verbose parameter is optional and it will give all the verbose logs.
  • Following is the sample command
     
                    
                    ConvertTo-Dockerfile -Local -OutputPath c:\docker_repo\iis -Artifact IIS -Verbose 

When it completes, the cmdlet generates a Dockerfile which turns that web server into a Docker image. The Dockerfile has instructions to install IIS and ASP.NET, copy in the website content, and create the sites in IIS.

Disk Images
  • After installing the Image2Docker PowerShell module, you will need one or more valid .vhdx or .wim files (the “source image”). To perform a scan of a valid VHDX or WIM image file, simply call the ConvertTo-Dockerfile command and specify the -ImagePath parameter, passing in the fully-qualified filesystem path to the source image.
  • Run the following command
     
                    
                     ConvertTo-Dockerfile `
                     -ImagePath {{ImagePath}} `
                     -OutputPath {{OutputPath}} `
                     -Artifact IIS  `	
                     -Verbose 
  • ImagePath parameter, specifying the location of the disk image. {{ImagePath}} -> Provide your valid .vhdx or .wim images path stored in the local machine. The disk image must be available locally.
  • OutputPath parameter specifies the location to store the generated Dockerfile and associated artifacts.
  • Artifact parameter specifies what artifact to inspect. In our case this is IIS.
  • Verbose parameter is optional and it will give all the verbose logs.
  • Following is the sample command
     
                    
                    ConvertTo-Dockerfile -ImagePath C:\vhds\qa-webserver-01.vhd -OutputPath 
                    c:\docker_repo\iis -Artifact IIS -Verbose 

The qa-webserver-01.vhd contains Two websites. One is AspNet MVC app and another one is the WEB API.

When the docker commandlet completes, the cmdlet generates a Dockerfile which turns that web server into a Docker image. The Dockerfile has instructions to install IIS and ASP.NET, copy in the website content, and create the sites in IIS.

The Image2Docker creates the website contents for ASPNET MVC app & WEB API and extract the dockerfile containing the websites configured on the image file.

Cloud computing is providing developers and IT departments with the ability to focus on what matters most and avoid undifferentiated work like procurement, maintenance, and capacity planning. As cloud computing has grown in popularity, several different models and deployment strategies have emerged to help meet specific needs of different users. Each type of cloud service, and deployment method, provides you with different levels of control, flexibility, and management. Understanding the differences between Infrastructure as a Service, Platform as a Service, and Software as a Service, as well as what deployment strategies you can use, can help you decide what set of services is right for your needs.

 

 
Cloud Computing Models

There are three main models for cloud computing. Each model represents a different part of the cloud computing stack.

 
cloud-computing-models_iaas

 
Infrastructure as a Service (IaaS):

Infrastructure as a Service, sometimes abbreviated as IaaS, contains the basic building blocks for cloud IT and typically provide access to networking features, computers (virtual or on dedicated hardware), and data storage space. Infrastructure as a Service provides you with the highest level of flexibility and management control over your IT resources and is most similar to existing IT resources that many IT departments and developers are familiar with today.

 
cloud-computing-models_paas

 
Platform as a Service (PaaS):

Platforms as a service remove the need for organizations to manage the underlying infrastructure (usually hardware and operating systems) and allow you to focus on the deployment and management of your applications. This helps you be more efficient as you don’t need to worry about resource procurement, capacity planning, software maintenance, patching, or any of the other undifferentiated heavy lifting involved in running your application.

 
cloud-computing-models_saas

 
Software as a Service (SaaS):

Software as a Service provides you with a completed product that is run and managed by the service provider. In most cases, people referring to Software as a Service are referring to end-user applications. With a SaaS offering you do not have to think about how the service is maintained or how the underlying infrastructure is managed; you only need to think about how you will use that particular piece software. A common example of a SaaS application is web-based email where you can send and receive email without having to manage feature additions to the email product or maintaining the servers and operating systems that the email program is running on.

We are evaluating pros and cons of different hosting solutions for SQL Server which best suits our business needs.

Our business needs

Our demand is very predictable seasonal demand. We are very small and can’t afford dedicated team for managing database infrastructure.( No DBA Team) Sky high expectation from Customers on availability and reliability for about 2 months in a year. Few minutes of downtown during peak period can cause havoc to our business . Fixed budget with very little wiggle room.   Our plan is to evaluate AWS SQL Server RDS, Azure RDS , Managed solutions from hosting provider. Evaluate each option in these categories.

  1. Performance and Reliability
  2. Ability to scale up during peak loads
  3. Cost ( Based on Network , Storage, Memory and CPU )
  4. Operations Efficiency
  5. Compliance
Infrastructure Requirements :

SQL Server Enterprise Edition since we use enterprise features AlwaysOn Availability group for High Availability Geo Replication or Multi Availability zone implementation for Cloud based databases Ability to route Read/Write workloads 128 Gig RAM – Minimum 1 – 2 TB Storage with 500 Gigs of SSD for TempDB Database and High Volume Tables Memory Optimized OLTP Support which needs SQL Server 2016 Edition Ability to handle ~ 30 K IOPS during peak load.

Amazon AWS SQL Server RDS

RDS Pricing Link  AWS SQL Server RDS Pricing http://www.ec2instances.info/rds/?selected=db.r3.8xlarge

Enterprise Edition  Single-AZ Deployment
 Price Per Hour
Memory Optimized Instances – Current Generation
db.r3.2xlarge$5.810
db.r3.4xlarge$11.404
db.r3.8xlarge$19.271

 

Multi-AZ Deployment
 Price Per Hour
Memory Optimized Instances – Current Generation
db.r3.2xlarge$11.620
db.r3.4xlarge$22.808
db.r3.8xlarge$38.542

AWS SQL Server RDS Configurations On-Demand for SQL Server (License Included) Multi-AZ Deployment Region:  US East (N. Virginia) Memory Optimized Instances – Current Generation Price Per Hour RAM : 244 GB 10 Gigabit 32 vCPU 20,000 Provisioned IOPS  

db.r3.8xlarge244 GB2 x 320 SSDIntel Xeon E5-2670 v2 (Ivy Bridge)32 vCPUs10 Gigabit

https://aws.amazon.com/rds/sqlserver/pricing/
Azure Pricing Calculator

Azure performance is measured in DTU. We have been collecting our performance metrics during load test. The following link provides lightweight utility to convert perfmon counters to Azure DTU’s.

Perfmon to Azure DTU calculator

Understanding DTUs Based on Microsoft definition :https://azure.microsoft.com/en-us/documentation/articles/sql-database-service-tiers/  

The Database Transaction Unit (DTU) is the unit of measure in SQL Database that represents the relative power of databases based on a real-world measure: the database transaction. We took a set of operations that are typical for an online transaction processing (OLTP) request, and then measured how many transactions could be completed per second under fully loaded.

 : Azure RDS Pricing Calculator
Azure SQL Server Pricing Calculator
Azure Options for SQL Server
https://azure.microsoft.com/en-us/pricing/details/sql-database/
Basic
eDTUs PER POOLMAX STORAGE PER POOL 1MAX DBs PER POOLMAX eDTUs PER DATABASEPRICE 2
10010 GB2005~$149/mo
20020 GB4005~$298/mo
40039 GB4005~$595/mo
80078 GB4005~$1,198/mo
1200117 GB4005~$1,800/mo
Standard
eDTUs PER POOLMAX STORAGE PER POOL 1MAX DBs PER POOLMAX eDTUs PER DATABASEPRICE 2
100100 GB200100~$223/mo
200200 GB400100~$446/mo
400400 GB400100~$900/mo
800800 GB400100~$1,800/mo
12001.2 TB400100~$2,701/mo
Premium
eDTUs PER POOLMAX STORAGE PER POOL 1MAX DBs PER POOLMAX eDTUs PER DATABASEPRICE 2
125250 GB50125~$697/mo
250500 GB50250~$1,399/mo
500750 GB50500~$2,790/mo
1000750 GB501000~$5,580/mo
1500750 GB501000~$8,370/mo
AWS SLA Summary
AWS ServiceSLAService CreditPctSLA Resource
RDS99.95%Less than 99.95% but equal to or greater than 99.0%10 %https://aws.amazon.com/rds/sla/
RDS99.95%Less than 99.0%25 %https://aws.amazon.com/rds/sla/
S399.9%Equal to or greater than 99.0% but less than 99.9%10 %https://aws.amazon.com/s3/sla/
S399.9%Less than 99.0%25 %https://aws.amazon.com/s3/sla/
EC299.95%Less than 99.95% but equal to or greater than 99.0%10 %https://aws.amazon.com/ec2/sla/
EC299.95%Less than 99.0%30 %https://aws.amazon.com/ec2/sla/
Route 53100 %5 – 30 minutes in a Billing Cycle1 day Service Credithttps://aws.amazon.com/route53/sla/
Route 53100 %31 minutes – 4 hours in a Billing Cycle7 days Service Credithttps://aws.amazon.com/route53/sla/
Route 53100 %More than 4 hours in a Billing Cycle30 days Service Credithttps://aws.amazon.com/route53/sla/
SLA Percentages
Availability %Downtime/MonthDowntime/WeekDowntime/Day
90% (“one nine”)72 hours16.8 hours2.4 hours
95%36 hours8.4 hours1.2 hours
97%21.6 hours5.04 hours43.2 minutes
98%14.4 hours3.36 hours28.8 minutes
99% (“two nines”)7.20 hours1.68 hours14.4 minutes
99.5%3.60 hours50.4 minutes7.2 minutes
99.8%86.23 minutes20.16 minutes2.88 minutes
99.9% (“three nines”)43.8 minutes10.1 minutes1.44 minutes
99.95%21.56 minutes5.04 minutes43.2 seconds
99.99% (“four nines”)4.38 minutes1.01 minutes8.66 seconds

  Useful Links Cloud Provider Service Availability https://cloudharmony.com/status

According to AWS Documentation The first time a DB instance is started and accesses an area of disk for the first time, the process can take longer than all subsequent accesses to the same disk area. This is known as the “first touch penalty.” Once an area of disk has incurred the first touch penalty, that area of disk does not incur the penalty again for the life of the instance, even if the DB instance is rebooted, restarted, or the DB instance class changes. Note that a DB instance created from a snapshot, a point-in-time restore, or a read replica is a new instance and does incur this first touch penalty.

Reference : http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_Storage.html.
I captured number of cached pages per database on our SQL Server RDS Instance and rebooted the instance and captured cached pages again. Based on the documentation, the cached pages should be available and shouldn’t be affected by Reboot process. Its such a neat feature which makes life lot easier for DBA’s to respond to unexpected situations. But I noticed significant differences between the number of  cached pages before and after reboot.

Cached Pages Before and After SQL Server RDS Reboot
DBNameBef_Buf_PagesSize_MBAft_Buf_PagesSize_MB
DatabaseOne55848243635964
DatabaseTwo101774873
DatabaseThree60942011
master19011070
model180370
msdb88262842
rdsadmin12539870
DatabaseFour1331590
Resource Database1877143192
tempdb77928060881230

For DatabaseOne , Cached Pages dropped from 558482 to 596. I am not sure whether others have encountered the same issue. Not sure what to think of the First Touch Penalty Promise to keep the cache intact. Maybe its not true for SQL Server RDS. 🙂

Structured, Semi-structured and Unstructured data

Big Data includes huge volume, high velocity, and extensible variety of data. These are 3 types: Structured data, Semi-structured data, and Unstructured data.

  1. Structured data is a data whose elements are addressable for effective analysis. It has been organised into a formatted repository that is typically a database. Example: Relational database.
  2. Semi-structured data is information that does not reside in a rational database but that have some organizational properties that make it easier to analyse. With some process, we can store them in the relation database (it could be very hard for some kind of semi-structured data), but Semi-structured exist to ease space. Example: XML data, JSON.
  3. Unstructured data is a data that is which is not organised in a pre-defined manner or does not have a pre-defined data model, thus it is not a good fit for a mainstream relational database. So for Unstructured data, there are alternative platforms for storing and managing, it is increasingly prevalent in IT systems and is used by organizations in a variety of business intelligence and analytics applications. Example: Word, PDF, Text, Media logs.

NoSQL (Not Only SQL database)

NoSQL is an approach to database design that can accommodate a wide variety of data models, including key-value, document, columnar and graph formats. NoSQL, which stand for “not only SQL,” is an alternative to traditional relational databases in which data is placed in tables and data schema is carefully designed before the database is built. NoSQL databases are especially useful for working with large sets of distributed data.

Key-value stores, or key-value databases, implement a simple data model that pairs a unique key with an associated value.

Document databases, also called document stores, store semi-structured data and descriptions of that data in document format. They allow developers to create and update programs without needing to reference master schema. Use of document databases has increased along with use of JavaScript and the JavaScript Object Notation (JSON).

Wide-column stores organize data tables as columns instead of as rows.

Graph data stores organize data as nodes, which are like records in a relational database, and edges, which represent connections between nodes.

Couchbase

Couchbase Server, originally known as Membase, is an open-source, distributed (shared-nothing architecture) multi-model NoSQL document-oriented database software package that is optimized for interactive applications. Couchbase Server is designed to provide easy-to-scale key-value or JSON document access with low latency and high sustained throughput. It is designed to be clustered from a single machine to very large-scale deployments spanning many machines.

Coubase Inc. describes Couchbase as an Engagement Database, a new category of database that enables enterprises to continually create and reinvent the customer experience. Unlike traditional databases, the Engagement Database taps into dynamic data, at any scale and across any channel or device, to liberate data’s full potential at a time when the strategic use of data to create exceptional customer experiences has become a key competitive differentiator for businesses.

In Engagement Database architecture data is first cached in memory, replicated for availability and then finally written to disk.

Core features of Couchbase

Data: Couchbase Server stores data as items. Each item consists of a key, by which the item is referenced; and an associated value, which must be either binary or a JSON document.

Buckets, Memory, and Storage: Items are stored in named Buckets; being kept only in memory, or both in memory and on disk.

Services: Services can be deployed to support different forms of data-access. Details are given in next section.

Clusters and Availability: A single node running Couchbase Server is considered a cluster of one node. As successive nodes are initialized, each can be configured to join the existing cluster.

Across the nodes of each cluster, Couchbase data is evenly distributed and replicated: nodes can be removed, and node-failure handled, without data-loss. Data can be selected for replication across clusters residing in different data centres, to ensure high availability.

Services

Couchbase Server provides the following services:

  1. Data: Supports the storing, setting, and retrieving of data-items, specified by key.
  2. Query: Parses queries specified in the N1QL query-language, executes the queries, and returns results. The Query Service interacts with both the Data and Index services.
  3. Index: Creates indexes, for use by the Query and Analytics services.
  4. Search: Create indexes specially purposed for Full Text Search. This supports language-aware searching; allowing users to search for, say, the word beauties, and additionally obtain results for beauty and beautiful.
  5. Analytics: Supports join, set, aggregation, and grouping operations; which are expected to be large, long-running, and highly consumptive of memory and CPU resources.
  6. Eventing: Supports near real-time handling of changes to data: code can be executed both in response to document-mutations, and as scheduled by timers.

N1QL

N1QL (pronounced nickel), is used for manipulating the JSON data in Couchbase, just like SQL manipulates data in RDBMS. It has SELECT, INSERT, UPDATE, DELETE, MERGE statements to operate on JSON data.

The N1QL data model is non-first normal form (N1NF) with support for nested attributes and domain-oriented normalization. The N1QL data model is also a proper superset and generalization of the relational model.

Example
 {
          "email": "[email protected]",
          "friends": [
            {"name":"rick"},
            {"name":"cate"}
          ]
        }
  

Like Query
 SELECT * FROM `bucket` WHERE email LIKE "%@example.org";
  

Array Query
 SELECT * FROM `bucket` WHERE ANY x IN friends SATISFIES x.name = "cate" END;  

Programming model

Couchbase provides client libraries for different programming languages such as Java / .NET / PHP / Ruby / C / Python / Node.js

Following is the core API that Couchbase offers. (in an abstract sense)

 # Get a document by key
        doc = get(key)
        
        # Modify a document, notice the whole document 
        #   need to be passed in
        set(key, doc)
        
        # Modify a document when no one has modified it 
        #  since my last read
        casVersion = doc.getCas()
        cas(key, casVersion, changedDoc)
        
        # Create a new document, with an expiration time 
        #   after which the document will be deleted
        addIfNotExist(key, doc, timeToLive)
        
        # Delete a document
        delete(key)
        
        # When the value is an integer, increment the integer
        increment(key)
        
        # When the value is an integer, decrement the integer
        decrement(key)
        
        # When the value is an opaque byte array, append more 
        #  data into existing value 
        append(key, newData)
        
        # Query the data 
        results = query(viewName, queryParameters)
  

Couchbase Java SDK

The code snippet below shows how the Java SDK may be used for some common operations:

 import com.couchbase.client.java.*;
        import com.couchbase.client.java.document.*;
        import com.couchbase.client.java.document.json.*;
        import com.couchbase.client.java.query.*;
        
        public class Example {
        
            public static void main(String... args) throws Exception {
        
                // Initialize the Connection
                Cluster cluster = CouchbaseCluster.create("localhost");
                cluster.authenticate("username", "password");
                Bucket bucket = cluster.openBucket("bucketname");
        
                // Create a JSON Document
                JsonObject arthur = JsonObject.create()
                    .put("name", "Arthur")
                    .put("email", "[email protected]")
                    .put("interests", JsonArray.from("Holy Grail", "African Swallows"));
        
                // Store the Document
                bucket.upsert(JsonDocument.create("u:king_arthur", arthur));
        
                // Load the Document and print it
                // Prints Content and Metadata of the stored Document
                System.out.println(bucket.get("u:king_arthur"));
        
                // Create a N1QL Primary Index (but ignore if it exists)
                bucket.bucketManager().createN1qlPrimaryIndex(true, false);
        
                // Perform a N1QL Query
                N1qlQueryResult result = bucket.query(
                    N1qlQuery.parameterized("SELECT name FROM `bucketname` WHERE $1 IN interests",
                    JsonArray.from("African Swallows"))
                );
        
                // Print each found Row
                for (N1qlQueryRow row : result) {
                    // Prints {"name":"Arthur"}
                    System.out.println(row);
                }
            }
        }
  

Spring Data Couchbase

The Spring Data Couchbase project provides integration with the Couchbase Server database. Key functional areas of Spring Data Couchbase are a POJO centric model for interacting with Couchbase Buckets and easily writing a Repository style data access layer.

1. Data Model

First create an entity class representing the JSON document to persist.

 @Document
        public class Person {
            @Id
            private String id;
             
            @Field
            @NotNull
            private String firstName;
             
            @Field
            @NotNull
            private String lastName;
             
            @Field
            @NotNull
            private DateTime created;
             
            @Field
            private DateTime updated;
             
            // standard getters and setters
        }
  

2. Couchbase Repository

We declare a repository interface for the Person class by extending CrudRepository<String,Person> and adding a derivable query method:

 public interface PersonRepository extends CrudRepository<Person, String> {
            List findByFirstName(String firstName);
        }
  

3. Service Layer

For our service layer, we define an interface and an implementation using the Spring Data repository abstraction. Here is our PersonService interface:

 public interface PersonService {
            Person findOne(String id);
            List findAll();
            List findByFirstName(String firstName);
             
            void create(Person person);
            void update(Person person);
            void delete(Person person);
        }
        
  

4. Service Implementation
 @Service
        @Qualifier("PersonRepositoryService")
        public class PersonRepositoryService implements PersonService {
             
            @Autowired
            private PersonRepository repo; 
         
            public Person findOne(String id) {
                return repo.findOne(id);
            }
         
            public List findAll() {
                List people = new ArrayList();
                Iterator it = repo.findAll().iterator();
                while(it.hasNext()) {
                    people.add(it.next());
                }
                return people;
            }
         
            public List findByFirstName(String firstName) {
                return repo.findByFirstName(firstName);
            }
         
            public void create(Person person) {
                person.setCreated(DateTime.now());
                repo.save(person);
            }
         
            public void update(Person person) {
                person.setUpdated(DateTime.now());
                repo.save(person);
            }
         
            public void delete(Person person) {
                repo.delete(person);
            }
        }
        
  

Spring Boot

Spring Boot is an open source Java-based framework used to create Micro Services. It is used to build stand-alone and production ready spring applications.

What is Micro Service?

Micro Service is an architecture that allows the developers to develop and deploy services independently. Each service running has its own process, and this achieves the lightweight model to support business applications.

Features and benefits of Spring Boot

  • Spring boot provides a flexible way to configure Java Beans, XML configurations, and Database Transactions.
  • It provides a powerful batch processing and manages REST endpoints.
  • In Spring Boot, everything is auto configured; no manual configurations are needed.
  • It offers annotation-based spring application.
  • Eases dependency management.
  • It includes Embedded Servlet Container.
  • It is highly dependent on the starter templates feature.

How Spring Boot works

Spring Boot automatically configures our application based on the dependencies we have added to the project by using @EnableAutoConfiguration annotation. For example, if MySQL database is on our classpath, but we have not configured any database connection, then Spring Boot auto-configures an in-memory database.

Spring Boot Starters

Handling dependency management is a difficult task for big projects. Spring Boot resolves this problem by providing a set of dependencies for developer’s convenience.

For example, if we want to create a web application with REST Endpoints, it is sufficient if we include spring-boot-starter-web dependency in our project.

Note that all Spring Boot starters follow the same naming pattern spring-boot-starter-*, where * indicates that it is a type of the application.

Example:

Spring Boot Starter Test dependency is used for writing Test cases. Its code is shown below:

                  org.springframework.boot
          spring-boot-starter-test
                
        

Spring Boot Application

The entry point of the Spring Boot Application is the class containing @SpringBootApplication annotation. This class should have the main method to run the Spring Boot application. @SpringBootApplication annotation includes @EnableAutoConfiguration, @ComponentScan, and @SpringBootConfiguration annotations.

Spring Boot automatically scans all the components included in the project by using @ComponentScan annotation.

Observe the following code for a better understanding:

                
        import org.springframework.boot.SpringApplication;
        import org.springframework.boot.autoconfigure.EnableAutoConfiguration;
        
        @SpringBootApplication
        public class DemoApplication {
        public static void main(String[] args) {
        SpringApplication.run(DemoApplication.class, args);
        }
                
        

Spring Boot – Quick Start – using Groovy

The Spring Boot CLI is a command line tool and it allows us to run the Groovy scripts. Create a simple groovy file which contains the Rest Endpoint script.

Hello.groovy

                
        @Controller
        class Example {
        @RequestMapping("/")
        @ResponseBody
        public String hello() {
        "Hello Spring Boot"
        }
        }
                
        

The above file can be run using the command “spring run Hello.groovy”

spring_command_img

Once we run the groovy file, required dependencies will download automatically and it will start the application in Tomcat 8080 port as shown in the screenshot above. You can also see that sping ‘Mapped “{[/]}” onto public java.lang.String Example.hello()’.

We can go to the web browser and hit the URL http://localhost:8080/, and see the output from hello() function as shown below:

hello_spring_img

Click here to continue >>

1. Security and Compliance:

If you are wondering why we are starting with security, then check out this number. $6 trillion, that’s the amount of annual damage cyber crimes is predicted to cost us by 2021.

Which is precisely why the first thing you need to check while picking your cloud service provider is their security and compliance levels – both physical as well as virtual – this includes the geographical location of their data centers and the local laws of the country they are based in.

There are a number of certifications and standards which guarantee the security preparedness of cloud vendors; their validity must be checked and additional investigations must be carried out by checking internal and third-party audits or reports.

You need to do a deep check of:

  • Security infrastructure and procedures followed by the vendor
  • Identity management and authorizations
  • Physical security controls including the process for natural disasters
  • Policies for data back-up and disaster recovery

2. Technical Capabilities

An obvious point, but it still needs to be reiterated.

Your service provider should have a full stack of technologies that support your current applications and also has the capability to match your future needs.

Cloud partnerships last a long time, and it’s important to check the future roadmap of the service provider to understand if they have the mindset to catch trends early and innovate.

Some questions to focus on:

  • Will your current software and applications integrate easily with the service provider’s cloud infrastructure?
  • Do they use standard interfaces and APIs for easy integration?
  • Do they have the capability of providing hybrid cloud computing options and do they have the flexibility to host different cloud environments and systems?
  • Are they backing their capabilities with SLAs?
  • Are they willing and capable to architect solutions tailored to your business?

3. Costs

No two cloud service providers have similar or comparable pricing packages. They each have their own formula of computing cloud costs, and it is almost impossible to make a side-by-side comparison of different vendors. What you need to do is map out your organization’s requirement as minutely as possible and then decide which pricing model suits your needs.

Keep in mind:

  • Consumption timelines as long-term contracts are better priced
  • The flexibility offered by service providers in scaling up or down
  • Check for hidden costs

4. Business Health

The stability of your business depends on the stability of its partners, and you cannot underestimate the importance of a cloud partner. Before finalizing your cloud vendor, it is important to check their business and financial health.

You should check:

  • The company’s financial records
  • Management structure and other third-party relationships
  • Reputation, reviews, and referrals from existing customers
  • For any legal run-ins
  • All available third-party audits

5. Support

Do you just have a phone or chat access or does your service provider offers dedicated account management? How much support you can get from your vendor is another important criteria, that must be considered before finalizing a service provider.

Find out about:

  • Time guarantees for solving technical issues
  • Access to support services – 24×7 or 12×5
  • Cost of opting for dedicated resources

Deciding on a cloud service provider is a long process that demands complete thoroughness and analysis from the CIO and the rest of the team.

Before we leave you to navigate your way to your future cloud partner, here are two more important points that must be considered – Right size and exit strategy.

Keep in mind that to get the best service you need to find a vendor who connects with you and for whom you are a valuable client.

And always plan an exit strategy in case things don’t work out.

Best of Luck!

When you are deploying a new change into production, the associated deployment should be in a predictable manner. In simple terms, this means no disruption and zero downtime! In case you do encounter a problem or a bottleneck, the deployment strategy should include a quick roll back.

The safe strategy can be achieved by working with two identical infrastructures – the “green” environment hosting the current production and the “blue” environment with the new changes.

The business and IT teams will have an opportunity to conduct sanity, smoke test or any other test in the “blue” environment before making a “Go” decision. Upon “Go”, the team can switch “blue” to “green” and “green” to “blue”.

In Azure, different processes are available for implementing the Blue-Green strategy with two environments.

We have listed below some of these techniques. Naturally, this list is not fixed and will grow continuously as new tool sets and services emerge.

  • Deployment slots – For Web Apps, deployment slots provide an easy way to implement Blue-Green deployments.
  • Azure Traffic Manager – This can be leveraged to realize Blue-Green deployments for smoother deployments with weighted round-robin routing method. The detailed configuration and implementation methods are available in Azure Documentation.
  • Using an Application Gateway with two backend pools and a routing rule – Have two backend resource pools with one as a stage pool and another one as a prod pool. Add stage VMSS to stage pool, prod VMSS to prod pool and have one routing rule in the app gateway. Depending on the need to use stage or prod VMSS, this rule will be changed to point to the appropriate backend address pool.

CloudIQ architects and engineers have implemented Blue-Green deployment for multiple clients, and in each case, we have customized our strategies to suit their use-cases. If you are looking for a completely safe way to deploy new software versions and applications, then reach out to us at [email protected]

There are several open source tools available to manage infrastructure as code that are backed by large communities of contributors with enterprise offering and good documentation. Why do we choose Terraform and what makes it unique / stand out? Terraform is used to provision an infrastructure and manage the infrastructure changes by versioning. It can manage components such as compute instances, storage, and networking, as well as high-level components such as DNS entries etc.

Good Fit for Cloud Agnostics Strategy

Enterprises would be interested to mitigate the availability risk of mission critical system in cloud by spreading their services across multiple cloud providers. Also, enterprises would always look for avenues to reduce their infrastructure cost by moving away from vendor locking situation. Terraform comes as savior for these use cases by being cloud-agnostic and allows a single configuration to be used to manage multiple providers, and to even handle cross-cloud dependencies by simplifying management and orchestration.

An Orchestration Tool

Chef, Puppet, Ansible, and SaltStack are all “configuration management” tools that are designed to install and manage software on existing servers whereas Terraform is an “orchestration tool” that is designed to provision the servers and leaving the configuring job to other tools. While there might be certain overlapping features between orchestration and configuration tools, each tool is going to be a better fit for certain use case. For example, when an infrastructure is dominated by Containers, all you need to do is provision a bunch of servers, then an orchestration tool like Terraform is typically going to be a better fit than a configuration management tool

Combat Configuration Drift

While configuration tools are best known to combat the configuration drift in the infrastructure, they are mostly used to manage a subset of machine’s state that will lead to some gap in the infrastructure state. The management will see diminishing returns to close those gaps against the matter that needs the most for daily operations. This set of issues can be mitigated with Terraform along with Containers.

For example, if you tell configuration tool to install a new version of OpenSSL, it’ll run the software update on your existing servers and the changes will happen in-place. Over time, as you apply more and more updates, each server builds up a unique history of changes, causing configuration drift. If you’re using Docker and an orchestration tool such as Terraform, the docker image is already built ready for the new servers. A new server will be deployed and then uninstall the old servers. All the server states will be maintained by Terraform. This approach reduces the likelihood of configuration drift bugs.

Conclusion

Overall, Terraform is an open source and cloud-agnostic orchestration tool with salient features. While it might be a less mature tool compared to other tools in the market, Terraform is still a good candidate to meet a specific set of requirements.

Apache Spark is an open-source parallel processing framework for running large-scale data analytics applications across clustered computers. It can handle both batch and real-time analytics and data processing workloads.

Spark provides distributed task transmission, scheduling, and I/O functionality. It provides programmers with a potentially faster and more flexible alternative to MapReduce, the software framework to which early versions of Hadoop were tied.

How Apache Spark works

Apache Spark can process data from a variety of data repositories, including the Hadoop Distributed File System (HDFS), NoSQL databases and relational data stores.

The Spark Core engine uses the resilient distributed data set, or RDD, as its primary data type. The RDD is designed in such a way to hide much of the computational complexity from users. It aggregates data and partitions it across a server cluster, where it can then be computed and either moved to a different data store or run through an analytic model. The user doesn’t have to define where specific files are sent or what computational resources are used to store or retrieve files.

Given below is a sample Spark program written in Python to count the number of records with each rating in the input file given in next page:

 from pyspark import SparkConf, SparkContext
        import collections
        
        conf = SparkConf().setMaster("local").setAppName("RatingsHistogram")
        sc = SparkContext(conf = conf)
        
        lines = sc.textFile("file:///SparkCourse/ml-100k/u.data")
        ratings = lines.map(lambda x: x.split()[2])
        result = ratings.countByValue()
        
        sortedResults = collections.OrderedDict(sorted(result.items()))
        for key, value in sortedResults.items():
            print("%s %i" % (key, value))
        
         

In the above code, sc is the SparkContext associated with input file u.data. ratings is a RDD created by mapping the 3rd column in input file (array occurrence [2] – Ratings). Here map() is a transformation function which produces a new RDD.

We can have multiple transformations in a single spark program each producing a new RDD from an existing RDD or an input file. countByValue() is an Action that is performed.
In Spark, the transformations are not executed until an Action is triggered. This is called Lazy Evaluation.

Apache Spark works

Figure 1

Spark languages

Spark was written in Scala, which is considered the primary language for interacting with the Spark Core engine. Out of the box, Spark also comes with API connectors for using Java, R, and Python.

Spark libraries

  • The Spark Core engine functions partly as an application programming interface (API) layer and underpins a set of related tools for managing and analyzing data.
  • Spark SQL — One of the most commonly used libraries, Spark SQL enables users to query data stored in disparate applications using the common SQL language.
  • Spark Streaming — This library allows users to build applications that analyze and present data in real time.
  • MLlib — A library of machine learning code that enables users to apply advanced statistical operations to data in their Spark cluster and to build applications around these analyses.
  • GraphX — A built-in library of algorithms for graph-parallel computation.

RDDs, DataFrames, and Datasets

An RDD is an immutable distributed collection of elements of data, partitioned across nodes in a cluster that can be operated in parallel with a low-level API that offers transformations and actions.

Like an RDD, a DataFrame is an immutable distributed collection of data. However, unlike an RDD, data is organized into named columns, like a table in a relational database.

Datasets in Apache Spark are an extension of DataFrame API which provides type-safe, object-oriented programming interface.

Executing SQL-style functions on a Dataframe

Given below is a map-reduce program to get the list of popular movies (which has been rated by many customers using the same input data as Figure 1 above).

 from pyspark import SparkConf, SparkContext
        
        conf = SparkConf().setMaster("local").setAppName("PopularMovies")
        sc = SparkContext(conf = conf)
        
        lines = sc.textFile("file:///SparkCourse/ml-100k/u.data")
        movies = lines.map(lambda x: (int(x.split()[1]), 1))
        movieCounts = movies.reduceByKey(lambda x, y: x + y)
        
        flipped = movieCounts.map( lambda xy: (xy[1],xy[0]) )
        sortedMovies = flipped.sortByKey()
        
        results = sortedMovies.collect()
        
        for result in results:
            print(result)
        
         

The same program, when written using DataFrames, will look like this

 from pyspark.sql import SparkSession
        from pyspark.sql import Row
        from pyspark.sql import functions
        
        def loadMovieNames():
            movieNames = {}
            with open("ml-100k/u.ITEM") as f:
                for line in f:
                    fields = line.split('|')
                    movieNames[int(fields[0])] = fields[1]
            return movieNames
        
        # Create a SparkSession (the config bit is only for Windows!)
        spark = SparkSession.builder.config("spark.sql.warehouse.dir", 
        "file:///C:/temp").appName("PopularMovies").getOrCreate()
        # Load up our movie ID -> name dictionary
        nameDict = loadMovieNames()
        
        # Get the raw data
        lines = spark.sparkContext.textFile("file:///SparkCourse/ml-100k/u.data")
        # Convert it to a RDD of Row objects
        movies = lines.map(lambda x: Row(movieID =int(x.split()[1])))
        # Convert that to a DataFrame
        movieDataset = spark.createDataFrame(movies)
        
        # Some SQL-Style magic to sort all movies by popularity in one line!
        topMovieIDs = movieDataset.groupBy("movieID").count().orderBy
        ("count", ascending=False).cache()
        
        # Show the results at this point:
        
        #|movieID|count|
        #+-------+-----+
        #|     50|  584|
        #|    258|  509|
        #|    100|  508|
        
        topMovieIDs.show()
        
        # Grab the top 10
        top10 = topMovieIDs.take(10)
        
        # Print the results
        print("\n")
        for result in top10:
            # Each row has movieID, count as above.
            print("%s: %d" % (nameDict[result[0]], result[1]))
        
        # Stop the session
        spark.stop()
        
         

As you can see DataFrames gives us the flexibility to use SQL style functions to get the required results. Because DataFrames APIs are built on top of the Spark SQL engine, it uses Catalyst to generate an optimized logical and physical query plan.

Job Scheduling

Spark has several facilities for scheduling resources between computations.

  • Each Spark application (instance of SparkContext) runs an independent set of executor processes. The cluster managers that Spark runs on provide facilities for scheduling across applications.
  • Within each Spark application, multiple “jobs” (Spark actions) may be running concurrently if they were submitted by different threads. This is common if the application is serving requests over the network. Spark includes a fair scheduler to schedule resources within each SparkContext.

Spark Streaming

Spark Streaming is an extension of the core Spark API that enables scalable, high-throughput, fault-tolerant stream processing of live data streams. Data can be ingested from many sources like Kafka, Flume, Kinesis, or TCP sockets, and can be processed using complex algorithms expressed with high-level functions like map, reduce, join and window. Finally, processed data can be pushed out to filesystems, databases, and live dashboards.

Spark Streaming

The Python program shown below counts the number of words in text data received from a data server listening on a TCP socket.

Sample input entered for this program at a terminal through NetCat and the output of the program is given below.

 
     
# TERMINAL 1:
# Running Netcat

$ nc -lk 9999

hello world
...

         
 
     
# TERMINAL 2: RUNNING network_wordcount.py
        
$ ./bin/spark-submit examples/src/main/python/streaming/network_wordcount.py 
localhost 9999
...
-------------------------------------------
Time: 2014-10-14 15:25:21
-------------------------------------------
(hello,1)
(world,1)
...

         

Conclusion

Launched for the first time in May 2014, Apache Spark has become the go-to program for companies that work with large-scale Big Data applications. The speed and agility of Spark have made it incredibly useful across a wide range of industries.

From FMCG giants to BFSI companies to digital advertising firms – Apache Spark has proved to be indispensable when it comes to aggregating data, gleaning insights and forecasting industry trends.

Requirements:-
  1. First step in getting up and running is to install VirtualBox. You can get appropriate version from the www.virtualbox.org
  2. Need to install vagrant. The same procedure is applies; grab the installer from ww.vagrantup.com.

We can start the cluster setup, so we need the vagrant file for cluster setup using that only we can set it out.

Or Else clone the below git repository for getting sample vagrant file

https://github.com/coreos/coreos-vagrant

Now that every thing is downloaded, we can look at how to configure vagrant for your CoreOS development environment

  1. Make copies and rename the configuration files: copy-user-data to user-data, and copy and rename config.rb.sample to config.rb
  2. Open confi.rb so that you can change the a few parameters to get vagrant up and running properly.
     # Size of the CoreOS cluster created by Vagrant
            $num_instances=2
             
  3. You may also want to tweak some other settings in config.rb. CPU, Memory settings can be modified as per your need.
     #Customize VMs
            $vm_gui = false
            $vm_memory = 1024
            $vm_cpus = 1
            $vb_cpuexecutioncap = 100
             

And then open the git shell to interact with vagrant

Go to your current working directory in your shell and issue this command

 vagrant up
         

You will see the things happening, which will look like this ,

Once the operation is completed you can verify that everything is up and running properly by logging in to one of the machines and using fleetctl to check the cluster

 vagrant ssh core-01
        fleetctl list-machines
         

If you see list of machines you created then you are finished, you now have a local cluster of CoreOS machines.

This is the fifth blog in our series helping you understand all about cloud, when you are in a dilemma to choose Azure or AWS or both, if needed.

Before we jumpstart on the actual comparison chart of Azure and AWS, we would like to bring you some basics on data analytics and the current trends on the subject.

If you would rather like to have quick look at the comparison table, Click here

This blog is intended to help you strategize your data analytics initiatives so that you can make the most informed decision possible by analyzing all the data you need in real time. Furthermore, we also will help you draw comparisons between Azure and AWS, the two leaders in cloud, and their capabilities in Big Data and Analytics as published in a handout released by Microsoft.

Beyond doubts, this is an era of data. Every touch point of your business generates volumes of data and these data cannot be simply whisked away, cast aside as valuable business insights can be unearthed with a little effort. Here’s where your Data Analytics infrastructure helps.

A 2017 Planning Guide for Data and Analytics published by Gartner written by the Analyst John Hagerty states that

The Key Findings as per the report are as follows:

  • Data and analytics must drive modern business operations, not just reflect them. Technical professionals must holistically manage an end-to-end data and analytics architecture to acquire, organize, analyze and deliver insights to support that goal.
  • Analytics are now infused in places where they never existed before.
  • Executives will seek strategies to better manage and monetize data for internal and external business ecosystems.
  • Data gravity is rapidly shifting to the cloud, with IoT, data providers and cloud-native applications leading the way. It is no longer a question of “if” for using cloud for data and analytics; it’s “how.”

The last point emphasizes on how cloud is playing a prominent role when it comes to Data Analytics and if you have thoughts on who and how, Gartner in its latest magic quadrant has said that AWS and Azure are the top leaders. Now, if you are in doubt whether to go the Azure way or AWS or should it be the both, here’s the comparison table showing their respective Big Data and Analytics Capabilities

 

ServiceDescriptionAWSAzure
Elastic data warehouseA fully managed data warehouse that analyzes data using business intelligence tools.RedshiftSQL Data Warehouse
Big data processingSupports technologies that break up large data processing tasks into multiple jobs, and then combine the results to enable massive parallelism.Elastic MapReduce (EMR)HDInsight
Data orchestrationProcesses and moves data between different compute and storage services, as well as on-premises data sources at specifed intervals.Data PipelineData Factory
Cloud-based ETL/data integration service that orchestrates and automates the movement and transformation of data from various sources.AWS Glue Data CatalogData Factory + Data Catalog
AnalyticsStorage and analysis platforms that create insights from massive quantities of data, or data that originates from many sources.Kinesis AnalyticsStream Analytics

Data Lake Analytics

Data Lake Store
Streaming dataAllow mass ingestion of small data inputs, typically from devices and sensors, to process and route data.Kinesis Streams

Kinesis Firehose
Event Hubs

Event Hubs Capture
Visualizationperform ad-hoc analysis, and develop business insights from data.QuickSight (Preview)Power BI
Allows visualization and data analysis tools to be embedded in applications.Power BI Embedded
SearchA scalable search server based on Apache Lucene.Elasticsearch ServiceMarketplace—Elasticsearch
Delivers full-text search and related search analytics and capabilities.CloudSearchSearch
Machine learningProduces an end-to-end workfow to create, process, refne, and publish predictive models from complex data sets.Machine LearningMachine Learning
Data discoveryProvides the ability to better register, enrich, discover, understand, and consume data sources.Data Catalog
A serverless interactive query service that uses standard SQL for analyzing databases.Amazon AthenaData Lake Analytics

Click here to read the entire guide published by Microsoft Azure Team:

This is our fourth blog in the series of blogs intended to help you embark on a cloud strategy, most importantly when you are in dilemma to choose AWS or Azure, the two prominent cloud players today.

If you had missed our earlier blogs, click here

1st Blog – Compute

2nd Blog- Storage

3rd Blog- CDN & Networking

Before we jumpstart on the actual comparison chart of Azure and AWS, we would like to bring you some basics on the database aspect of cloud strategy.

If you would rather like to have quick look at the database comparison table, click here

Through this blog, let’s understand the database aspect of your cloud strategy. As per the guide, Database services refers to options for storing data, whether it’s a managed relational SQL database that’s globally distributed or a multi-model NoSQL database designed for any scale.

When you decide cloud, one of the critical decisions you face is which database to use – SQL or NoSQL. Though SQL has an impressive track record, NoSQL is not far behind as it is gradually making notable gains and has many proponents. Once you have picked your database, the other big decision to make is which cloud vendor to choose amongst the many vendors.

Here’s where you consider Gartner’s prediction; the research company published a document that states

“Public cloud services, such as Amazon Web Services (AWS), Microsoft Azure and IBM Cloud, are innovation juggernauts that offer highly operating-cost-competitive alternatives to traditional, on-premises hosting environments.

Cloud databases are now essential for emerging digital business use cases, next-generation applications and initiatives such as IoT. Gartner recommends that enterprises make cloud databases the preferred deployment model for all new business processes, workloads, and applications. As such, architects and tech professionals should start building a cloud-first data strategy now, if they haven’t done so already”

Reinstating the trend, recently Gartner has published a new magic quadrant for infrastructure-as-a-service (IaaS) that – surprising nobody – has Amazon Web Services and Microsoft alone in the leader’s quadrant and a few others thought outside of the box.

 

Now, the question really is, Azure or AWS for your cloud data? Or should it be both? Here’s a quick comparison table to guide you.

ServiceDescriptionAWSAzure
Relational databaseSQL Database is a high-performance, reliable, and secure database you can use to build data-driven applications and websites, without needing to manage infrastructure.RDSSQL Database including Postgres and MySQL
NoSQL—document storageA globally-distributed, multi-model database that natively supports multiple data models: key-value, documents, graphs, and columnar.DynamoDBCosmos DB
NoSQL—key/value storageA non-relational data store for semi-structured data.DynamoDB and SimpleDBTable Storage
CachingAn in-memory–based, distributed-caching service that provides a high-performance store typically used to offoad non-transactional work from a database.ElastiCacheRedis Cache
Database migrationFocuses on migration of database schema and data from one database format to a specifc database technology in the cloud.Database Migration Service (Preview)SQL Database Migration Wizard

Click here to read the entire guide published by Microsoft Azure Team:

In line with our latest blog series highlighting how common cloud services are made available via Azure and Amazon Web Services (AWS), as published by Microsoft, this third blog in the series helps you understand Cloud Networking and Content Delivery capabilities of both Azure and AWS.

Before we jumpstart on the actual comparison chart of Azure and AWS, we would like to bring you some basics on cloud content delivery networking and the current trends on the subject.

If you would rather like to have quick look at the comparison table, click here

When we talk about cloud Content Delivery Network (CDN) and the related networking capabilities it includes all the hardware and software that allows you to easily provision private networks, connect your cloud application to your on-premises datacenters, and more.

According to Gartner, Content delivery networks (CDNs) are a type of distributed computing infrastructure, where devices (servers or appliances) reside in multiple points of presence on multi-hop packet-routing networks, such as the Internet, or on private WANs. A CDN can be used to distribute rich media downloads or streams, deliver software packages and updates, and provide services such as global load balancing, Secure Sockets Layer acceleration and dynamic application acceleration via WAN optimization techniques.

In simpler terms, this highly distributed server platforms are optimized to deliver content in a way that improves customer experience. Hence, it is important to decrease latency by keeping the data closer to the users, protect it from security threats while ensuring rapid streamlined content delivery including general web delivery, content purge, content caching and tracking history as long as 90 days.

As per G2Crowd.com, most organizations use CDN services, such as web caching, request routing, and server-load balancing, to reduce load times and improve website performance. Further to qualify as a CDN provider, a service provider must:

  • Allow access to a geographically dispersed network of PoPs in multiple data centers
  • Help websites access this network to deliver content to website visitors
  • Offer services designed to improve website performance
  • Provide scalable Internet bandwidth allowances according to customer needs
  • Maintain data center(s) of servers to reduce the possibility of overloading individual instances

With this background, let’s look at the AWS vs Azure comparison chart in terms of Networking and Content Delivery Capabilities:

ServiceDescriptionAWSAzure
Cloud virtual networkingProvides an isolated, private environment in the cloud.Virtual Private CloudVirtual Network
Cross-premises connectivityConnects Azure virtual networks to other Azure virtual networks or customer on-premises networks. It also supports VPN tunneling.AWS VPN GatewayVPN Gateway
Domain name system managementManage DNS records using the same credentials, billing, and support contract as other Azure services.

Service that hosts domain names, routes users to Internet applications, manages traf c to apps, and improves app availability with automatic failover.
Route 53




Route 53
DNS




Traffic Manager
Content delivery networkGlobal content delivery network that transfers audio, video, applications, images, and other les.CloudFrontContent Delivery Network
Dedicated networkEstablishes a dedicated, private network connection from a location to the cloud provider.Direct ConnectExpressRoute
Load balancingAutomatically distributes incoming application traf c to add scale, handle failover, and route to a collection of resources.Elastic Load BalancingLoad Balancer

Application Gateway

To read more about the Microsoft guide which briefs all about cloud by drawing comparisons between Azure or AWS, click here (link to PDF download)

You may also like to read our previous blogs in these series, if so, please click here:

https://www.cloudiqtech.com/azure-vs-aws-compute/
https://www.cloudiqtech.com/aws-vs-azure-cloud-storage/

Azure or AWS or Azure & AWS? What’s your cloud strategy for Storage?

This is our second blog, in our latest blog series helping you understand all about cloud, especially when you are in doubt whether to go Azure or AWS or both.

To read our first blog talking about Cloud strategy in general and Compute in particular, click here…

Moving on, in this blog let’s find what Azure or AWS offer when it comes to Storage Capabilities for your Cloud Infrastructure.

Globally CIOs are increasingly looking to cease running their own data centers and move to cloud which is evident when we read the projection made by a leading researcher, MarketsandMarkets. They had reported that the global cloud storage business sector to grow from $18.87 billion in 2015 to $65.41 billion by 2020, at a compound annual growth rate (CAGR) of 28.2 percent during the forecast period.

Reinstating the fact, 451 Research’s Voice of the Enterprise survey last year stated that Public cloud storage spending will double by next year (2017). “IT managers are recognizing the need for storage transformation to meet the realities of the new digital economy, especially in terms of improved efficiency and agility in the face of relentless data growth,” said Simon Robinson, research vice president at 451 and research director of the new Voice of the Enterprise: Storage service. “It’s clear from our Q4 study that emerging options, especially public cloud storage and all-flash array technologies, will be increasingly important components in this transformation” he added further.

As we see, many companies are in for Cloud Storage, undoubtedly. But the big question – Whom to choose from a gamut of leading public cloud players including big players like AZURE, AWS; Should it be AZURE alone for your cloud storage or AWS or a combination of both still prevails.

This needs a thorough understanding. To help you decide for good, we have decided to re-produce a guide, published by Microsoft that briefs Azure‘s capabilities in comparison to AWS when it comes to Cloud Strategy. And we will see the Storage part in this blog, but before, that a little backgrounder on Cloud Storage.

When we talk about cloud storage device mechanisms, we include all logical units of data storage covering from files, blocks, and datasets to objects and their relative storage interfaces. These instances of virtual storage devices are designed specifically for cloud-based provisioning and can be scaled as per need. It is to be noted that different cloud service consumers utilize different technologies to interface with virtualized cloud storage devices.

ServiceDescriptionAWSAzure
Object storageObject storage service for use cases including cloud apps,
content distribution, backup, archiving, disaster recovery,
and big data analytics.
Simple Storage Services (S3) Storage—Block Blob (for content logs, files) (Standard—Hot)
Virtual Server disk
infrastructure
SSD storage optimized for I/O intensive
read/write operations.
Elastic Block Store (EBS)Disk Storage—Page Blobs (for VHDs or other random-write type data)

Disk Storage—Premium Storage
Shared file storageA simple interface to create and configure file
systems quickly as well as share common files.
Elastic File SystemFile Storage (file share between VMs)
Archiving—cool storageA lower cost tier for storing data that is
infrequently accessed and long-lived.
S3 IA GlacierStorage—Hot, Cool & Archive Tier
BackupBackup and archival solutions that allow files and folders
to be backed-up and recovered from the cloud, and
provide off-site protection against data loss.
Backup and RecoveryBackup
Hybrid storageIntegrates on-premises IT environments with cloud
storage. Automates data management and storage, plus
supports disaster recovery.
Storage GatewayStorSimple
Bulk data transferA data transport solution that uses secure disks and
appliances to transfer substantial amounts of data.

Petabyte- to Exabyte-scale data transport solution.
AWS Import/Export Disk




AWS Import/Export Snowball

AWS Snowball Edge

AWS Snowmobile
Import/Export



Data Box
Disaster recoveryAutomates protection and replication of virtual
machines with health monitoring, recovery plans,
and recovery plan testing.
Site Recovery

For a more detailed understanding download the document here

Surprisingly, as per an article published by Gartner, “Cloud Computing is still perplexing to many CIOs even after a decade of cloud’. While cloud computing is a foundation for digital business, Gartner estimates that less than one-third of enterprises have a documented cloud strategy. This indeed comes as a surprise given the fact that cloud has evolved from a disruption to the indispensable tech of today and tomorrow, all along strategically adopted by many progressive companies.

In the same article Donna Scott, Vice President and distinguished analyst at Gartner states that “Cloud computing will become the dominant design style for new applications and for refactoring a large number of existing applications over the next 10-plus years”. She also added that “A cloud strategy clearly defines the business outcomes you seek, and how you are going to get there. Having a cloud strategy will enable you to apply its tenets quickly with fewer delays, thus speeding the arrival of your ultimate business outcomes.”

However, it is easier said than done. Many top businesses still have questions like how to make the most from cloud computing? What kind of architectures and techniques need to be strategized to support the many flavors of evolving cloud computing? Private or Public? Hybrid or Public? Azure or AWS, or it should be a hybrid combo?

Through a series of blogs we intent to bring answers to these questions. As a first one, we would like to highlight and represent a comparative cloud service map focusing on both Azure and AWS both leaders in public cloud platforms, as published by Microsoft.

The well-researched article draws detailed comparisons between Azure and AWS and how common cloud services across parameters such as Marketplace, Compute, Storage, Networking, Database, Analytics, Big Data, Intelligence, IOT, Mobile and Enterprise Integration are made available via Azure and Amazon Web Services (AWS)

It should be noted that as prominent public cloud platforms providers, Azure and AWS each offer businesses a wide and comprehensive capabilities across the globe. Many organizations have chosen either one of them or both depending upon their needs in order to gain more agility, and flexibility while minimizing the risk and maximizing the larger benefits of a multi-cloud environment.

For starters, let’s start with COMPUTE and the points one should consider and compare before deciding the Azure or AWS approach or a combination of both.

ServiceDescriptionAWSAzure
Virtual serversAllows users to deploy, manage, and maintain
OS and server software; instance types provide
configurations of CPU/RAM.

Offers a lightweight, simplified product offering users can
choose from from when building out a virtual machine.
Elastic Compute Cloud (EC2)
VMs




Amazon Lightsail
Virtual Machines





Virtual Machine Images
Container managementSupports Docker containers and allows users to run
applications on managed instance clusters.

Allows customers to store Docker formatted images. Used
to create all types of container deployments on Azure.
EC2 Container Service (ECS)




EC2 Container Registry
Container Service




Container Registry
Microservice-based
applications
Orchestrates and manages the execution, lifetime, and
resilience of complex, interrelated code components
that can be either stateless or stateful.
Service Fabric
Backend process logicIntegrates systems and runs backend processes
in response to events or schedules without
provisioning or managing servers.
LambdaFunctions


Event Grid
Job orchestrationWhen processing across hundreds or thousands
of compute nodes, this tool orchestrates the
tasks and interactions between compute
resources that are necessary.
AWS BatchBatch
ScalabilityAutomatically changes the number of instances
providing a compute workload. Users set defined
metrics and thresholds that determine if the platform
adds or removes instances.
AWS Auto ScalingVirtual Machine Scale Sets

App Service Scale Capability (PAAS)

AutoScaling
Pre-defined templatesCommunity-led templates for creating and
deploying virtual machine-based solutions.
AWS Quick StartQuickstart templates

For a more detailed understanding download the document here

A microservices-based architecture introduces agility, flexibility and supports a sustainable DEVOPS culture ensuring closer collaboration within businesses and the news is that it’s actually happening for those who embraced it.

True, monolith apps architectures have enabled businesses to benefit from IT all along as it is single coded, simple to develop, test and run. As they are also based on a logical modular hexagonal or layered architectures (Presentation Layer responsible for handling HTTP requests and responding with either HTML or JSON/XML, Business logic layer, Database access and Apps integration) they cover and tie all processes, functions and gaps to an extent.

Despite these ground level facts, monolith software, which is instrumental for businesses embrace IT in their initial stages and which even exists today, is seeing problems. The growing complex business operation conditions are purely to be blamed.

So, how do businesses today address new pressures caused by digitization, continuous technology disruptions, increased customer awareness & interceptions and sudden regulatory interventions? The answer lies in agility, flexibility and scalability of the underlying IT infrastructure- the pillars of rapid adaptability to changes.

Monolith Apps, even though it is based on a well-designed 3 tier architecture, in the long run, loses fluidity and turns rigid. Irrespective of its modularity, modules are still dependent on each other and any minimal change in one module needs generation and deployment of all artifacts in each server pool, touched across the distributed environment.

Besides whenever there is a critical problem, the blame game starts amongst the UI developers, business logic experts, backend developers, database programmers, etc as they are predominantly experts in their domains, but have little knowledge about other processes. As the complexity of business operations sets in, the agility, flexibility and scalability part of your software is highly tested in a monolithic environment.

Here’s where Microservices plays a huge role as the underlying architecture helps you break your software applications into independent loosely coupled services that can be deployed and managed solely at that level and needn’t have to depend on other services.

For example, if your project needs you to design and manage inventory, sales, shipping, and billing and UI shopping cart modules, you can break each service down as an independently deployable module. Each has its own database, where monitoring and maintenance of application servers are done independently as the architecture allows you to decentralize the database, reducing complexity. Besides it enables continuous delivery/deployment of large, complex applications which means technology also evolves along with the business.

The other important aspect is that microservices promotes a culture wherein whoever develops the service is also responsible to manage it. This avoids the handover concept and the following misunderstandings and conflicts whenever there is a crisis.

In line with the DevOps concept, Microservices enables easy collaboration between the development and operations team as they embrace and work on a common toolset that establishes common terminology, as well as processes for requirements, dependencies, and problems. There is no denying the fact that DevOps and microservices work better when applied together.

Perhaps that’s the reason companies like Netflix, Amazon, etc are embracing the concept of microservices in their products. And for other new businesses embracing it, a new environment where agility, flexibility and closer collaboration between business and technology becomes a reality providing the much-needed edge in these challenging times.

Allow access to s3 bucket only from vpc

Currently I am evaluating options to lockdown permissions to my S3 Buckets as part of Security Enhancements. These are the steps I followed to lock down S3 Bucket access only to my VPC

Create VPC End Points

VPC End Points Screen Shot

Attach the S3 Bucket Policy to Restrict Access
{
	"Version": "2012-10-17",
	"Id": "Policy123456789",
	"Statement": [
		{
			"Sid": "Stmt123456789",
			"Effect": "Deny",
			"Principal": "*",
			"Action": "s3:*",
			"Resource": "arn:aws:s3:::example-confidential/*",
			"Condition": {
				"StringNotEquals": {
					"aws:sourceVpc": "vpc-2f2b202b"
				}
			}
		}
	]
}
Access the Buckets Outside VPC

Once you have attached the policy, if you access the S3 Files through console not being on VPC, you will receive the error.

AccessDenied Access Denied 14FB0BEFD8A0C8E5 JrFOr/6Fe20lyMxjCy6lPhJIJ8sj3kG7zSiel2kcvv6OUssHQ2W/e7bYTjD3hXjX2m1/aHB+G1I=
Access the Buckets from VPC

If you log into a EC2 Instance which is hosted on VPC, you will be able to access the s3 Bucket.

SSH Into your EC2 Machine and verify your VPC through Instance Meta Data Store.

[ec2-user]# curl http: //169.254.169.254/latest/meta-data/network/interfaces/macs/
01:ed:88:51:f6:29/ [ec2-user]# curl http: //169.254.169.254/latest/meta-data/network//interfaces/macs/01:ed:88:51:f6:29/vpc-id
vpc-2f2b202b

If you execute s3 commands to access the bucket, you will be able to access the S3 Bucket without access denied error.

aws s3 ls example-confidential aws s3 cp s3:: //example-confidential/SampleConfidentialFile.txt  SampleConfidentialFile.txt

Cloud based Video Analysis is an upcoming field that strives to solve and automate video analysis in real time or near real time. The engine that drives the solution is set of cloud based APIs supported by Cloud providers such as AWS, Azure, Google Cloud etc. These APIs are built on top of Computer Vision, Face Recognition and Object Tracking. All these APIs are REST based and take a video frame or set of frames and return a JSON document that summarizes the analysis result and the percentage of confidence. To achieve real time or near real time analysis the enterprise solution needs to address the following constraints:

 

  • Process streaming video input into smaller frame set and process them in parallel – This allows for efficient processing
  • Use advanced heuristics and machine learning to minimize calls to API – the cloud APIs for cognitive services are priced by the number of calls. And hence using heuristics to infer results based on Machine Learning will reduce overall cost.

 

Use-case solved:

The solution we built here streams a live video stream from a series of traffic cameras operating simultaneously and trying to find vehicles that are infringing red lights and vehicles that are pulled over curbs. We also filter out sensitive content from video if the frames match the criteria and need to be displayed on the User Interface.

 

Solution:

The streaming video is broken down into frame-set of 10 seconds. These frames are then queued up in a Azure Service Bus Queue. An Azure function then analyzes the frames for existence of objects using an open source Computer Vision library. The frames with no objects are not sent to Cognitive Services. We also do other heuristics and CV analysis to pre-determine if a call to Cloud API for cognitive services is needed at all. Once a frame-set is marked ready for cognitive services it is sent to a different Service Bus Queue. Another Azure function makes a call to cognitive services and gathers statistics of the frame set. Based on configurations, the azure function determines which frames are identified for the match and forwards them to another Service Bus Queue. A third azure function processes these frame-sets and blurs sensitive content on these frame-sets and stores them in Azure Blob Storage. The matched content can be viewed in a Node Js, Angular 2 based web application running in Azure Container Service.

 

Design:

realtime-video

 

Results & Conclusion:
  • Able to achieve real time analysis with minimal API cost
  • Able to scale horizontally for multiple video streams
  • Able to achieve multiple analysis objectives on video streams

AWS region codeAWS region nameNumber of AZsAZ names
us-west-2Oregon3us-west-2a, us-west-2b, us-west-2c
ap-southeast-2Sydney2ap-southeast-2a, ap-southeast-2b, ap-southeast-2c
us-east-1Virginia4us-east-1a, us-east-1b, us-east-1c, us-east-1e
us-west-1N. California2us-west-1a, us-west-1b
eu-west-1Ireland3eu-west-1a, eu-west-1b, eu-west-1c
eu-central-1Frankfurt2eu-central-1a, eu-central-1b
ap-southeast-1Singapore2ap-southeast-1a, ap-southeast-1b
ap-northeast-1Tokyo2ap-northeast-1a, ap-northeast-1c
sa-east-1Sao Paulo3sa-east-1a, sa-east-1b, sa-east-1c

 
                
                C:\Users\Raju>AWS ec2 describe-regions
                C:\Users\Raju> aws ec2 describe-availability-zones 
AWS Regions for EC2

Availability Zones

Here is a look at some of the common queries that will be useful when troubleshooting AURORA database.

Number of Connections by Host
SELECT SUBSTRING(HOST, 1, 10) , DB,USER , COUNT(*) AS Count
    FROM information_schema.processlist 
    group by SUBSTRING(HOST, 0, 10) , DB,USER 
    ORDER BY Count desc ;
    -- '10.10.50.22', 'Portal', 'webguest-dev', '46'
Aurora Max Connections
 select AURORA_VERSION();
        select * from mysql.slow_log 
        where sql_text not like '%LOAD DATA%'
        order by query_time desc
        limit 1000 ;
         
        select count(*) from mysql.general_log 
        where user_host not like 'rdsadmin%'
        and user_host not like '[rdsadmin]%'
        and event_time > '2017-06-15 18:51:14';
         
        select current_timestamp();
         
        desc mysql.general_log  ;
         
        select @@MAX_CONNECTIONS
        -- '4000'
         
        select *  from mysql.general_log 
        where command_type like '%Connect%';
         
        select *  from mysql.general_log_backup 
        where command_type like '%Connect%' ;
         
        SHOW GLOBAL STATUS LIKE '%Connection_errors%';
         
        SHOW STATUS WHERE `variable_name` = 'Threads_connected';
Monitor Memory Optimized Table Space Usage

Memory-optimized tables are fully durable by default, and, like transactions on (traditional) disk-based tables, transactions on memory-optimized tables are fully atomic, consistent, isolated, and durable (ACID). Memory-optimized tables and natively compiled stored procedures support only a subset of Transact-SQL features. The following blog post shows how to monitor the table space usage.

                ;
        WITH    system_allocated_memory ( system_allocated_memory_in_mb )
              AS ( SELECT   ISNULL(( SELECT CONVERT(DECIMAL(18, 2), 
             ( SUM(TMS.memory_allocated_for_table_kb)
             + SUM(TMS.memory_allocated_for_indexes_kb) )
             / 1024.00)
             FROM   [sys].[dm_db_xtp_table_memory_stats] TMS
             WHERE  TMS.object_id <= 0
             ), 0.00)
         ),
             table_index_memory ( table_used_memory_in_mb, table_unused_memory_in_mb, 
             index_used_memory_in_mb, index_unused_memory_in_mb )
             AS ( SELECT   ISNULL(( SELECT CONVERT(DECIMAL(18, 2), 
             ( SUM(TMS.memory_used_by_table_kb)
             / 1024.00 ))
             ), 0.00) AS table_used_memory_in_mb ,
             ISNULL(( SELECT CONVERT(DECIMAL(18, 2), ( SUM(TMS.memory_allocated_for_table_kb)
             - SUM(TMS.memory_used_by_table_kb) )
             / 1024.00)
             ), 0.00) AS table_unused_memory_in_mb ,
             ISNULL(( SELECT CONVERT(DECIMAL(18, 2), ( SUM(TMS.memory_used_by_indexes_kb)
             / 1024.00 ))
             ), 0.00) AS index_used_memory_in_mb ,
             ISNULL(( SELECT CONVERT(DECIMAL(18, 2), ( SUM(TMS.memory_allocated_for_indexes_kb)
             - SUM(TMS.memory_used_by_indexes_kb) )
             / 1024.00)
             ), 0.00) AS index_unused_memory_in_mb
             FROM     [sys].[dm_db_xtp_table_memory_stats] TMS
            WHERE    TMS.object_id > 0
           )
        SELECT  s.system_allocated_memory_in_mb ,
                t.table_used_memory_in_mb ,
                t.table_unused_memory_in_mb ,
                t.index_used_memory_in_mb ,
                t.index_unused_memory_in_mb ,
            ISNULL(( SELECT DATABASEPROPERTYEX(DB_NAME(DB_ID()),
            'IsXTPSupported')
            ), 0) AS has_memory_optimized_filegroup
        FROM    system_allocated_memory s ,
           table_index_memory t
        
        SELECT  t.object_id ,
            t.name ,
            ISNULL(( SELECT CONVERT(DECIMAL(18, 2), ( TMS.memory_used_by_table_kb )
            / 1024.00)
            ), 0.00) AS table_used_memory_in_mb ,
                ISNULL(( SELECT CONVERT(DECIMAL(18, 2), ( TMS.memory_allocated_for_table_kb
                    - TMS.memory_used_by_table_kb )
            / 1024.00)
            ), 0.00) AS table_unused_memory_in_mb ,
                ISNULL(( SELECT CONVERT(DECIMAL(18, 2), ( TMS.memory_used_by_indexes_kb )
            / 1024.00)
            ), 0.00) AS index_used_memory_in_mb ,
                ISNULL(( SELECT CONVERT(DECIMAL(18, 2), ( TMS.memory_allocated_for_indexes_kb
                    - TMS.memory_used_by_indexes_kb )
            / 1024.00)
            ), 0.00) AS index_unused_memory_in_mb
        FROM    sys.tables t
            JOIN sys.dm_db_xtp_table_memory_stats TMS ON ( t.object_id = TMS.object_id )
          
           
All Memory Used by Memory Optimized Table across Database Engine
                
        -- this DMV accounts for all memory used by the hek_2 engine 
        SELECT type ,
        name ,
        memory_node_id ,
        pages_kb / 1024 AS pages_MB
        FROM sys.dm_os_memory_clerks
        WHERE type LIKE '%xtp%' 
Enable Natively Compiled Stored Procedure Stats Collection
                
        EXEC [sys].[sp_xtp_control_proc_exec_stats] @new_collection_value = 1  
        DECLARE @c BIT  
        EXEC sp_xtp_control_proc_exec_stats @old_collection_value = @c OUTPUT  
        SELECT  @c AS 'collection status' 
DBCC FREEPROCCACHE does not remove natively compiled stored procedures from Plan Cache
                -- https://connect.microsoft.com/SQLServer/Feedback/Details/3126441
        
        DECLARE @sql NVARCHAR(MAX) = N''
        
        SELECT  @sql += N'EXECUTE sp_recompile N'''
                + QUOTENAME(SCHEMA_NAME(o.schema_id)) + N'.' + QUOTENAME(o.name) + '''
        '
        FROM    sys.sql_modules sm
                JOIN sys.objects o ON sm.object_id = o.object_id
        WHERE   uses_native_compilation = 1
        
        EXECUTE sp_executesql @sql
        
                
        -- Reset wait and latch statistics.
        DBCC SQLPERF('sys.dm_os_latch_stats' , CLEAR)
        DBCC SQLPERF('sys.dm_os_wait_stats' , CLEAR) 
Errors Encountered During Migration :

Msg 41317, Level 16, State 5, Line 6
A user transaction that accesses memory optimized tables or natively compiled modules cannot access more than one user database or databases model and msdb, and it cannot write to master.

Windows Containers do not ship with Active Directory support and due to their nature can’t (yet) act as a full-fledged domain joined objects, but a certain level of Active Directory functionality can be supported through the use of Globally Managed Service Accounts (GMSA).

Windows Containers cannot be domain-joined, they can also take advantage of Active Directory domain identities similar to when a device is realm-joined. With Windows Server 2012 R2 domain controllers, we introduced a new domain account called a group Managed Service Account (GMSA) which was designed to be shared by services.

https://blogs.technet.microsoft.com/askpfeplat/2012/12/16/windows-server-2012-group-managed-service-accounts/

https://technet.microsoft.com/en-us/library/hh831782(v=ws.11).aspx

We can authenticate to Active Directory resources from Windows container which is not part of your domain. For this to work certain prerequisites needs to be met.

For once your container hosts shall be part of Active Directory and you shall be able to utilize Group Managed Service Accounts.
https://technet.microsoft.com/en-us/library/hh831782%28v=ws.11%29.aspx?f=255&MSPPError=-2147217396

The following steps needed for communicate Windows container with on premise SQL server using GMSA.
Environments are used and described for this post.

  1. Active directory Domain Controller installed on server CloudIQDC1.
    • OS – Windows Server 2012/2016.
    • The domain name is cloudiq.local
  2. Below are the Domain members (Computers) joined in DC
    • CIQ-2012R2-DEV
    • CIQSQL2012
    • CIQ-WIN2016-DKR
    • cloud-2016
  3. SQL server installed on CIQSQL2012. This will be used for GMSA testing.
    • OS – Windows 2012
  4. cloud-2016 will be used to test GSMA connection.
    • This is the container host we are using to connect on premise SQL server using GMSA account.

  5. The GMSA account name is “container_gsma”. We will create this and configure it.
Step 1: Create the KDS Root Key
  1. We can generate this only once per domain.
  2. This is used by the KDS service on DCs (along with other information) to generate passwords.
  3. Login to domain controller.
  4. Open PowerShell and execute the below.
                            Import-module ActiveDirectory
        Add-KdsRootKey –EffectiveTime ((get-date).addhours(-10));5.
         

  5. Verify your key using the below command.
            Get-KdsRootKey
         
Step 2: Create GMSA account
  1. Create GSMA account using the below command.
            
        New-ADServiceAccount -Name container_gmsa -DNSHostName cloudiq.local 
        -PrincipalsAllowedToRetrieveManagedPassword "Domain Controllers", "domain admins", 
        "CN=Container Hosts,CN=Builtin, DC=cloudiq, DC=local" -KerberosEncryptionType RC4, AES128, AES256
         

  2. Use below command to verify the created GMSA account.
            Get-ADServiceAccount -Identity container_gmsa 
  3. If everything works as expected, you’ll notice a new gMSA object in your domain’s Managed Service Account.
Step 3: Add GMSA account to Servers where we are going to use.
  1. Open the Active directory Admin Center.
  2. Select the container_gmsa account and click on properties.
  3. Select the security and click on add.
  4. Select only Computers
  5. Select Computers you want to use GMSA. In our case we need to add CIQSQL2012 and cloud-2016.
  6. Reboot Domain controller first to these changes take effect.
  7. Reboot the computers who will be using GMSA. In our case we need to reboot CIQSQL2012 and cloud-2016.
  8. After reboots, login to Domain controller. Execute the below command.
            
        Set-ADServiceAccount -Identity container_gmsa -PrincipalsAllowedToRetrieveManagedPassword 
        CloudIQDC1$,cloud-2016$, CIQSQL2012$
         

Step 4: Install GMSA Account on Servers
  1. Login to the system where the GMSA account which will use it. In our case login to cloud-2016. This is the container host we are using to connect on premise SQL server using GMSA account.
  2. Execute the below command if AD features are not available.
            
        Enable-WindowsOptionalFeature -FeatureName ActiveDirectory-Powershell -online -all
         
  3. Execute the below commands
            Get-ADServiceAccount -Identity container_gmsa
        Install-ADServiceAccount -Identity container_gmsa
        Test-AdServiceAccount -Identity container_gmsa 

  4. If everything is working as expected then you need to create credential spec file which need passed to docker during container creation to utilize this service account. Run the below commands to downloads module which will create this file from Microsoft github account and will create a JSON file containing required data.
            
        Invoke-WebRequest "https://raw.githubusercontent.com/Microsoft/Virtualization-Documentation/live/windows-server-container-tools/ServiceAccounts/CredentialSpec.psm1" 
        -UseBasicParsing -OutFile $env:TEMP\cred.psm1
        Import-Module $env:temp\cred.psm1
        New-CredentialSpec -Name Gmsa -AccountName container_gmsa
        #This will return location and name of JSON file
        Get-CredentialSpec 

Step 5: SQL Server Configuration to allow GMSA
  1. On SQL server create login for GMSA account and add it to “Sysadmin” role. Based on your on premise DB access, you can add suitable roles.
            CREATE LOGIN [cloudiq\container_gmsa$] FROM WINDOWS
        sp_addsrvRolemember "cloudiq\container_gmsa$", "sysadmin" 
Getting started with AWS Data Pipeline

AWS Data Pipeline is a web service that you can use to automate the movement and transformation of data. With AWS Data Pipeline, you can define data-driven workflows, so that tasks can be dependent on the successful completion of previous tasks.

AWS Data Pipe Line Sample Workflow


AWS Data Pipe Line Sample Workflow
Default IAM Roles

AWS Data Pipeline requires IAM roles to determine what actions your pipelines can perform and who can access your pipeline’s resources.

The AWS Data Pipeline console creates the following roles for you:

DataPipelineDefaultRole

DataPipelineDefaultResourceRole

DataPipelineDefaultRole:
 
                {
                "Version": "2012-10-17",
                "Statement": [
                 {
                 "Effect": "Allow",
                 "Action": [
                 "s3:List*",
                 "s3:Put*",
                 "s3:Get*",
                 "s3:DeleteObject",
                 "dynamodb:DescribeTable",
                 "dynamodb:Scan",
                 "dynamodb:Query",
                 "dynamodb:GetItem",
                 "dynamodb:BatchGetItem",
                 "dynamodb:UpdateTable",
                 "ec2:DescribeInstances",
                 "ec2:DescribeSecurityGroups",
                 "ec2:RunInstances",
                 "ec2:CreateTags",
                 "ec2:StartInstances",
                 "ec2:StopInstances",
                 "ec2:TerminateInstances",
                 "elasticmapreduce:*",
                 "rds:DescribeDBInstances",
                 "rds:DescribeDBSecurityGroups",
                 "redshift:DescribeClusters",
                 "redshift:DescribeClusterSecurityGroups",
                "sns:GetTopicAttributes",
                 "sns:ListTopics",
                 "sns:Publish",
                 "sns:Subscribe",
                 "sns:Unsubscribe",
                 "iam:PassRole",
                 "iam:ListRolePolicies",
                 "iam:GetRole",
                 "iam:GetRolePolicy",
                 "iam:ListInstanceProfiles",
                 "cloudwatch:*",
                 "datapipeline:DescribeObjects",
                 "datapipeline:EvaluateExpression"
                 ],
                 "Resource": [
                 "*"
                 ]
                 }
                ]
                } 
DataPipelineDefaultResourceRole:
 
                {
                "Version": "2012-10-17",
                "Statement": [
                {
                 "Effect": "Allow",
                 "Action": [
                 "s3:List*",
                 "s3:Put*",
                 "s3:Get*",
                 "s3:DeleteObject",
                 "dynamodb:DescribeTable",
                 "dynamodb:Scan",
                 "dynamodb:Query",
                 "dynamodb:GetItem",
                 "dynamodb:BatchGetItem",
                 "dynamodb:UpdateTable",
                 "rds:DescribeDBInstances",
                 "rds:DescribeDBSecurityGroups",
                 "redshift:DescribeClusters",
                 "redshift:DescribeClusterSecurityGroups",
                 "cloudwatch:PutMetricData",
                 "datapipeline:*"
                 ],
                 "Resource": [
                 "*"
                 ]
                }
                ]
                } 
Error Message:

Error MessageUnable to create resource for @EC2ResourceObj_2017-05-05T04:25:32 due to: No default VPC for this user (Service: AmazonEC2; Status Code: 400; Error Code: VPCIdNotSpecified; Request ID: bd2f3abb-d1c9-4c60-977f-6a83426a947d)

Resolution:

When you look at your VPC, you would notice Default VPC is not configured. While launching EC2 Instance on Data Pipeline, by default it can’t figure out which VPC to use and that needs to be explicitly specified in Configurations.

SubNetID for EC2 Resource

Default VaPC
Build Sample Data Pipeline to Load S3 File into MySQL Table :

Use Cases for AWS Data Pipeline
Setup sample Pipeline in our develop environment
Import Text file from AWS S3 Bucket to AURORA Instance
Send out notifications through SNS to [email protected]
Export / Import Data Pipe Line Definition.

Prerequisites:

Have MySQL Instance
Access to Invoke Data Pipeline with appropriate permissions
Target Database and Target Table
SNS Notification setup with right configuration

Steps to Follow:

Create Data Pipeline with Name
Create MySQL Schema and Table
Configure Your EC2 Resource ( Make sure EC2 instance has access to MySQL Instance ).
If MySQL instance allows only certain IPS’s and VPC, then you need to configure your EC2 Resource in the same VPC or Subnet.
Configure Data Source and appropriate Data Format ( Notice this is Pipe Delimited File ant CSV File ).
Configure your SQL Insert Statement
Configure SNS Notification for PASS / FAIL Activity.
Run your Pipeline and Troubleshoot if errors occur.


Data Pipe Line JSON Definiton:

AWS_Data_PipeLine_S3_MySQL_Defintion.json

Create Table SQL :
 
                create table users_01(
                userid integer not null primary key,
                username char(8),
                firstname varchar(30),
                lastname varchar(30),
                city varchar(30),
                state char(2),
                email varchar(100),
                phone char(14),
                likesports varchar(100),
                liketheatre varchar(100),
                likeconcerts varchar(100),
                likejazz varchar(100),
                likeclassical varchar(100),
                likeopera varchar(100),
                likerock varchar(100),
                likevegas varchar(100),
                likebroadway varchar(100),
                likemusicals varchar(100)) 
 
                INSERT INTO `ETLStage`.`users_01`
                (`userid`,
                `username`,
                `firstname`,
                `lastname`,
                `city`,
                `state`,
                `email`,
                `phone`,
                `likesports`,
                `liketheatre`,
                `likeconcerts`,
                `likejazz`,
                `likeclassical`,
                `likeopera`,
                `likerock`,
                `likevegas`,
                `likebroadway`,
                `likemusicals`)
                VALUES
                (
                ?,
                ?,
                ?,
                ?,
                ?,
                ?,
                ?,
                ?,
                ?,
                ?,
                ?,
                ?,
                ?,
                ?,
                ?,
                ?,
                ?,
                ?
                ); 
Errors Encountered:

errorMessageQuote character must be defined in record format

https://stackoverflow.com/questions/26111111/data-pipeline-error-on-a-template-from-rds-to-s3-copy

You can use “TSV” type as your custom format type and provide:

  • “Column separator” as pipe(|),
  • “Record separator” as new line(\n),
  • “Escape Char” as backslash(\) or any other character you wa

errorId : ActivityFailed:SQLException
errorMessage : No value specified for parameter
errorMessage : Parameter index out of range (1 > number of parameters, which is 0).
errorMessage : Incorrect integer value: ‘FALSE’ for column ‘likesports’ at row 1

Ensure the Table Column Data Type set to correct . By Default MySQL Doesn’t covert TRUE / FALSE into Boolean Data Type.

errorMessage : Parameter index out of range (1 > number of parameters, which is 0).
errorMessage for Load script: ERROR 1227 (42000) at line 1: Access denied; you need (at least one of) the LOAD FROM S3 privilege(s) for this operation

Image2Docker is a PowerShell module which ports existing Windows application workloads to Docker. It supports multiple application types, but the initial focus is on IIS and ASP.NET apps. You can use Image2Docker to extract ASP.NET websites from a VM – or from the local machine or a remote machine. Then so you can run your existing apps in Docker containers on Windows, with no application changes.

Image2Docker also supports Windows Server 2012, with support for 2008 and 2003 on its way. The websites on this VM are a mixture of technologies – ASP.NET WebForms, ASP.NET MVC, ASP.NET WebApi, together with a static HTML website.

To learn more about Image2Docker, please visit the following link

https://github.com/docker/communitytools-image2docker-win

Microsoft Windows 10 and Windows Server 2016 introduced new capabilities for containerizing applications.There are two types of container formats supported on the Microsoft Windows platform:

  • Hyper-V Containers – Containers with a dedicated kernel and stronger isolation from other containers
  • Windows Server Containers – application isolation using process and namespace isolation, and a shared kernel with the container host

Prerequisite
  • PowerShell 5.0 needs to be installed to use Image2Docker.

      Download URL: https://www.microsoft.com/en-us/download/details.aspx?id=50395

  • Image2Docker generates a Dockerfile which you can build into a Docker image. The system running the ConvertTo-Dockerfile command does not need Docker installed, but you will need Docker setup on Windows to build images and run containers.

Installation
  • Open PowerShell with administrative privileges. Run the following commands
     
                    
                    Install-Module Image2Docker
                    Import-Module Image2Docker
                     
  • You can validate the presence of the Install-Module command by running: Get-Command -Module PowerShellGet -Name Install-Module. If the PowerShellGet module or the Install-Module commands are not accessible, you may not be running a supported version of PowerShell. Make sure that you are running PowerShell 5.0 or later on a given machine.

Usage
  • Image2Docker can inspect web servers and extract a Dockerfile containing some or all of the websites configured on the server. ASP.NET is supported, and the generated Dockerfile will be correctly set up to run .NET 2.0, 3.5 or 4.x sites.
  • Image2Docker Supports the following source types.
    • Local Machines
    • Remote Path
    • Disk Images

The following commands show how to setup and run Image2Docker on local machines. Instructions on how to run it on remote path and disk images will be covered in future blog posts.

Local Machines
  • This mode looking for the IIS installed on the local machine and convert the IIS sites /virtual directories/ applications to docker files associate artifacts.
  • Run the following command
     
                    
                     ConvertTo-Dockerfile `
                     -Local `
                     -OutputPath {{OutputPath}} `
                     -Artifact IIS  `	
                     -Verbose
                     
  • Local parameter used for iis discovery on local machines.
  • OutputPath parameter specifying the location to store the generated Dockerfile and associated artifacts.
  • Artifact parameter specifies what artifact to inspect. In our case this is IIS.
  • Verbose parameter is optional and it will give all the verbose logs.
  • Following is the sample command
     
                    
                    ConvertTo-Dockerfile -Local -OutputPath c:\docker_repo\iis -Artifact IIS -Verbose
                     

This is a continuation of the previous posts that covered how to setup and run Image2Docker.

Docker Installation Status
  • Open PowerShell command and execute the following command.
  • docker info
  • Docker is already installed in the system If the command returns something like the below.

  • The docker is not installed in the machine if you see the error like below


Install Docker if not exists
  • Please follow the instructions below if docker is not installed in your machine.
  • Install the Docker-Microsoft PackageManagement Provider from the PowerShell Gallery.
    Install-Module -Name DockerMsftProvider -Repository PSGallery -Force
  • Next, you use the PackageManagement PowerShell module to install the latest version of Docker.
    Install-Package -Name docker -ProviderName DockerMsftProvider
  • When PowerShell asks you whether to trust the package source ‘DockerDefault’, type A to continue the installation. When the installation is complete, reboot the computer.
    Restart-Computer -Force
        Tip: If you want to update Docker later:
        Check the installed version with
     
                    Get-Package -Name Docker -ProviderName DockerMsftProvider 

    Find the current version with    

     
                    Find-Package -Name Docker -ProviderName DockerMsftProvider 

    When you’re ready, upgrade with

     
                    Install-Package -Name Docker -ProviderName DockerMsftProvider -Update -Force 
     
                    Start-Service Docker 
  • Ensure your Windows Server system is up-to-date by running. Run the following command.
     
                    Sconfig 
    • This shows a text-based configuration menu, where you can choose option 6 to Download and Install Updates.
       
                      
                      ===============================================================================
                                               Server Configuration
                      ===============================================================================
                      
                      1) Domain/Workgroup:                    Workgroup:  WORKGROUP
                      2) Computer Name:                       WIN-HEFDK4V68M5
                      3) Add Local Administrator
                      4) Configure Remote Management          Enabled
                      
                      5) Windows Update Settings:             DownloadOnly
                      6) Download and Install Updates
                      7) Remote Desktop:                      Disabled
                      ...
                       
    •  When prompted, choose option A to download all updates.
Create Containers from Imag2Docker Dockerfile.
  • Make sure that docker installed on your Windows 2016 or Windows 10 with Anniversary updates.
  • To build that Dockerfile into an image, run:
     
                    docker build -t img2docker/aspnetwebsites. 
  • Here img2docker/aspnetwebsites is the name of the image. You can give your own name based on your needs.
  • When the build completes, we can run a container to start my ASP.NET sites.
  • This command runs a container in the background, exposes the app port, and stores the ID of the container.
     
                    $id = docker run -d -p 81:80 img2docker/aspnetwebsites 

    Here 81 is the host port number and 80 is the container port number.

  • When the site starts, we will see in the container logs that the IIS Service (W3SVC) is running:
     
                    docker logs $id 

    The Service ‘W3SVC’ is in the ‘Running’ state.

  • Now you can browse to the site running in IIS in the container, but because published ports on Windows containers don’t do loopback yet, if you’re on the machine running the Docker container, you need to use the container’s IP address:
     
                    $ip = docker inspect --format '{{ .NetworkSettings.Networks.nat.IPAddress }}' 
     
                    $id 
     
                    start http://$($ip) 

That will launch your browser and you’ll see your ASP.NET Web application running in IIS, in Windows Server Core, in a Docker container.

This is a continuation of the previous blog post on GMSA setup.

Step 1: Create Docker Image
  1. I have created ASPNET MVC app and it accessing the SQL server using windows authentication.
  2. My Connection string looks like below.
     
                    
                    <connectionStrings>
                    <add name="AdventureWorks2012Entities"
                    connectionString="metadata=res://*/ManagerEmployeeModel.csdl|res://*/ManagerEmployee
                    Model.ssdl|res://*/ManagerEmployeeModel.msl;provider=System.Data.SqlClient;provider 
                    connection string=&quot;data source=CIQSQL2012;initial
                    catalog=AdventureWorks2012;integrated
                    security=True;MultipleActiveResultSets=True;App=EntityFramework&quot;"
                    providerName="System.Data.EntityClient" />
                    </connectionStrings>
                     
  3. I have created the Docker file and necessary build folders using image2docker. Refer Image2Docker
  4. Docker file looks like below
     
                    
                    # escape=` 
                    FROM microsoft/aspnet:3.5-windowsservercore-10.0.14393.1066 
                    SHELL ["powershell", "-Command", "$ErrorActionPreference = 'Stop'; 
                    $ProgressPreference = 'SilentlyContinue';"] 
                     
                    # disable DNS cache so container addresses always fetched from Docker 
                    RUN Set-ItemProperty -path 'HKLM:\SYSTEM\CurrentControlSet\Services\Dnscache\
                    Parameters' -Name ServerPriorityTimeLimit -Value 0 -Type DWord 
                     
                    RUN Remove-Website 'Default Web Site'; 
                     
                    RUN Enable-WindowsOptionalFeature -Online -FeatureName IIS-ApplicationDevelopment,
                    IIS-ASPNET,IIS-ASPNET45,IIS-CommonHttpFeatures,IIS-DefaultDocument,
                    IIS-DirectoryBrowsing,IIS-HealthAndDiagnostics,IIS-HttpCompressionStatic,
                    IIS-HttpErrors,IIS-HttpLogging,IIS-ISAPIExtensions,IIS-ISAPIFilter,
                    IIS-NetFxExtensibility,IIS-NetFxExtensibility45,IIS-Performance,IIS-RequestFiltering,
                    IIS-Security,IIS-StaticContent,IIS-WebServer,IIS-WebServerRole,NetFx4Extended-ASPNET45 
                     
                    # Set up website: MyGSMAMvc 
                    RUN New-Item -Path 'C:\inetpub\wwwroot\MyAspNetMVC_GSMA' -Type Directory -Force;  
                     
                    RUN New-Website -Name 'MyGSMAMvc' -PhysicalPath 'C:\inetpub\wwwroot\MyAspNetMVC_GSMA' -Port 80 -Force;  
                     
                    EXPOSE 80 
                     
                    COPY ["MyAspNetMVC_GSMA", "/inetpub/wwwroot/MyAspNetMVC_GSMA"] 
                     
                    RUN $path='C:\inetpub\wwwroot\MyAspNetMVC_GSMA'; ` 
                        $acl = Get-Acl $path; ` 
                        $newOwner = [System.Security.Principal.NTAccount]('BUILTIN\IIS_IUSRS'); ` 
                        $acl.SetOwner($newOwner); ` 
                        dir -r $path | Set-Acl -aclobject  $acl 
                    
                     
  5. Move the necessary files to cloud-2016.
  6. Login to the cloud-2016 server.
  7. Create the image using the below commands. Refer Docker commands.
     
                    
                    docker build -t myaspnetmvc/gmsa  
Step 2: Create Container
  1. when you are creating docker container you need to specify the additional configuration to utilize GMSA. Please execute below commands
     
                    
                    docker run -d --security-opt "credentialspec=file://Gmsa.json" myaspnetmvc/gmsa 
  2. Or execute the commands below
     
                     $id = docker run -d --security-opt "credentialspec=file://Gmsa.json" myaspnetmvc/gmsa docker logs $id
                    $ip = docker inspect --format '{{ .NetworkSettings.Networks.nat.IPAddress }}' $id start http://$($ip)
                     
  3. Browse the appropriate page, you can see DB records.
  4. You can test the Active directory communication below. 
    1. Login into running docker container using docker exec command and check if, in fact, you can communicate to Active Directory. Execute nltest /parentdomain to verify
       
                      docker exec -it 0974d72624eb powershell 
                      nltest /parentdomain 
                      cloudiq.local. (1) 
                      The command completed successfully
                        

CloudIQ Tech, a growing cloud company, helping businesses, big or small, make the right cloud move to realize the true economies of cloud, has announced that it has achieved a Gold status for the Microsoft Cloud Platform Competency. The gold level is the highest Microsoft partner level, putting CloudIQ in an exclusive category with the other top partners.

The milestone achievement demonstrates CloudIQ Tech’s deep commitment, vast expertise in Microsoft cloud solutions and its team’s willingness to acquire in-depth knowledge and proficiency in Cloud tools and solutions while uniquely aligning them to evolving Microsoft’s Cloud Strategy and Competency goals. It is to be noted that to earn a Microsoft Gold Competency Certification, partner’s team members must successfully demonstrate their level of technology expertise in general, and deep knowledge of Microsoft and its products in particular. It is a valuable recognition by Microsoft for its partner’s holistic expertise in designing, migrating, integrating and delivering Windows-based applications and infrastructure solutions in the cloud using the Microsoft platform.

Commenting on the occasion, Mr. Prem Kumar Kandalu, CEO of CloudIQ Tech, said ”By achieving a Gold Competency, our dream to be part of the distinguished top 1 percent of Microsoft’s partner ecosystem has come true. This is a major step towards our objective of becoming a well known strategic player in Microsoft Cloud Solutions. Already within a short span of time we had become an Azure Gold Partner and now this Gold status for Microsoft Cloud Platform Competency will help us deliver cloud solutions with more confidence so that our customers drive innovative solutions on the latest Microsoft technology and move ahead successfully”.

About CloudIQ Tech:

CloudIQ Tech is a technology company helping businesses get the best out of emerging technologies, innovation and creative ideas. Our firm conviction that cloud is the way to go has enabled us to invest considerable time and efforts in R&D, focusing in designing, building, and managing cloud infrastructures and solutions that are uncomplicated, easily deployable, scalable while delivering the much needed edge from day one to our customers. The efforts are continual, ably supported by our team of cloud technical experts holding the highest possible certificate levels in designing, developing and implementing AWS and Azure cloud-based solutions.

Today our portfolio includes a range of Solutions & Services that comprise Cloud Consulting, Cloud Migration, Cloud Infrastructure Management services and Managed Cloud services besides DevOps Orchestration and home grown cloud apps and products. These cloud solutions empower people and organizations to innovate, increase operational efficiency, find opportunities to reduce cost and increase profits, and stay ahead of competition

Having achieved the Gold status for Microsoft Cloud Platform Competency will help us deliver cloud solutions with more confidence so that our customers drive innovative solutions on the latest Microsoft technology and move ahead successfully.

 

This blog post explains how to setup and configure SQL Server docker container on a linux machine. Microsoft recently started supporting running SQL Server on Linux and the entire process takes only few steps to run.

Install SQL Server Docker Image
 

//Pull the SQL Server Image from the docker registry
$docker pull microsoft/mssql-server-linux
$docker run -e 'ACCEPT_EULA=Y' -e 'SA_PASSWORD=Password' -p 1433:1433 -d microsoft/mssql-server-linux
$docker exec -it 40dd "bash"
$/opt/mssql-tools/bin/sqlcmd -S localhost -U SA
Password:
 

[root@ip-10-0-0-110 ec2-user]# docker pull microsoft/mssql-server-linux
[root@ip-10-0-0-110 ec2-user]# docker run -e 'ACCEPT_EULA=Y' -e 'SA_PASSWORD=Password' -p 1433:1433 -d microsoft/mssql-server-linux
[root@ip-10-0-0-120 ec2-user]# docker exec -it 40dd "bash"
root@40dde973f4a0:/# /opt/mssql-tools/bin/sqlcmd -S localhost -U SA
Password:

Command History
[root@ip-10-0-0-110 ec2-user]# docker pull microsoft/mssql-server-linux
Using default tag: latest
latest: Pulling from microsoft/mssql-server-linux
4c0c60131530: Pull complete
Digest: sha256:604d27fe5d3d9b4434fb1657e9bf4f2c2bf55ea9bd29dc0cb3660d84bc6f56a8
Status: Downloaded newer image for microsoft/mssql-server-linux:latest
[root@ip-10-0-0-110 ec2-user]# docker run -e 'ACCEPT_EULA=Y' -e 'SA_PASSWORD=Password' -p 1433:1433 -d microsoft/mssql-server-linux
40dde973f4a0cc2af469f9d1c2182403d1e22e28c2a8821e29ce832529965513
[root@ip-10-0-0-120 ec2-user]# docker -it 40dd "bash"
flag provided but not defined: -it
See 'docker --help'.
[root@ip-10-0-0-120 ec2-user]# docker exec -it 40dd "bash"
root@40dde973f4a0:/# /opt/mssql-tools/bin/sqlcmd -S localhost -U SA
Password:
1> select @@servername;
2> go

--------------------------------------------------------------------------------------------------------------------------------
40dde973f4a0

(1 rows affected)
1> select db_name();
2> go

--------------------------------------------------------------------------------------------------------------------------------
master

(1 rows affected)
Version Info:
 

select @@version

/*
Microsoft SQL Server 2017 (CTP2.1) - 14.0.600.250 (X64) 
May 10 2017 12:21:23 
Copyright (C) 2017 Microsoft Corporation. All rights reserved.
Developer Edition (64-bit) on Linux (Ubuntu 16.04.2 LTS)
*/
Backup Database on Docker Container and Copy to Host:

Connect to SQL Server Management Studio or SQLCMD and issue the backup command

Backup Database on Docker Container
 

BACKUP DATABASE H1BData_V2
TO DISK ='/var/opt/mssql/data/SalaryDatabase_V2_06132017.bak'
Copy the file from Container to Host and Sync with S3 Bucket:
 
 $ docker cp <containerId>:/file/path/within/container /host/path/target

$ docker cp aabb19ca439f:/var/opt/mssql/data/SalaryDatabase_06132017.bak /Docker/

$ aws s3 sync ./ s3://docker-backups

upload: ./SalaryDatabase_06132017.bak  to s3://docker-backups/SalaryDatabase_06132017.bak

Completed 18.4 GiB/25.8 GiB (46.4 MiB/s) with 1 file(s) remaining
Troubleshooting :
 

root@e83b4048db28:/var/opt/mssql/log# /opt/mssql-tools/bin/sqlcmd -S localhost -U SA
Password:
Sqlcmd: Error: Microsoft ODBC Driver 13 for SQL Server : Login failed for user ‘SA’..
root@e83b4048db28:/var/opt/mssql/log# exit
exit
[root@ip-10-0-0-120 ec2-user]# docker rm $(docker ps -a -q)
Error response from daemon: You cannot remove a running container e83b4048db28505951f20fff4aff9f5132695fd1e1c7251c8daeb79d15ac403d. Stop the container before attempting removal or use -f
[root@ip-10-0-0-120 ec2-user]# docker rm -f $(docker ps -a -q)
e83b4048db28
Unable to telnet without allowing access to port 1433
MacBook:.ssh Raju$ telnet 54.44.40.26 1433
Trying 54.44.40.26…
telnet: connect to address 54.44.40.26: Operation timed out
telnet: Unable to connect to remote host
MacBook:.ssh Raju$ telnet 54.44.40.26 1433
Trying 54.44.40.26…
Connected to ec2-54-44-40-26.us-west-2.compute.amazonaws.com.
Escape character is ‘^]’.
^CError while connecting through SQL Server Management Studio without allowing access to port 1433
TITLE: Connect to Server
——————————
Cannot connect to 54.44.40.26.
——————————
ADDITIONAL INFORMATION:

A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: Named Pipes Provider, error: 40 – Could not open a connection to SQL Server) (Microsoft SQL Server, Error: 53)

For help, click: http://go.microsoft.com/fwlink?ProdName=Microsoft%20SQL%20Server&EvtSrc=MSSQLServer&EvtID=53&LinkId=20476

———-

The network path was not found

———

BUTTONS:

OK

———

 

Error while connecting through SQLCMD without allowing access to port 1433

C:\Users\>sqlcmd -S 54.44.40.26 -U SA
Password: HResult 0x35, Level 16, State 1
Named Pipes Provider: Could not open a connection to SQL Server [53].
Sqlcmd: Error: Microsoft SQL Server Native Client 10.0 : A network-related or in
stance-specific error has occurred while establishing a connection to SQL Server
. Server is not found or not accessible. Check if instance name is correct and i
f SQL Server is configured to allow remote connections. For more information see
SQL Server Books Online..
Sqlcmd: Error: Microsoft SQL Server Native Client 10.0 : Login timeout expired.

Open port 1433 in Security Groups

Allow inbound traffic from the IP’s or Security Groups you need SQL Server Access.



Restart Docker Container and See Docker Logs

 

1001 docker ps -a
1002 docker restart bb1b1
1003 docker logs bb
SQL Server Docker Container Errors:
 

017-06-09 00:08:39.35 spid9s Starting up database ‘tempdb’.
2017-06-09 00:08:39.45 spid26s Recovery of database ‘UserDBName’ (7) is 0% complete (approximately 1717 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
2017-06-09 00:08:40.15 spid9s The tempdb database has 1 data file(s).
2017-06-09 00:08:40.16 spid36s The Service Broker endpoint is in disabled or stopped state.
2017-06-09 00:08:40.17 spid36s The Database Mirroring endpoint is in disabled or stopped state.
2017-06-09 00:08:40.35 spid36s Service Broker manager has started.
2017-06-09 00:08:41.05 spid33s [INFO] HkRecoverFromLogOpenRange(): Database ID: [5]. Log recovery scan from 00000495:00005D20:006B to 000004C7:00010F48:0002.
2017-06-09 00:08:59.47 spid26s Recovery of database ‘UserDBName’ (7) is 16% complete (approximately 110 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
2017-06-09 00:09:11.16 Logon Error: 18456, Severity: 14, State: 38.
2017-06-09 00:09:11.16 Logon Login failed for user ‘UserName’. Reason: Failed to open the explicitly specified database ‘UserDBName’. [CLIENT: 00.000.00.00]
2017-06-09 00:09:19.51 spid26s Recovery of database ‘UserDBName’ (7) is 30% complete (approximately 94 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
2017-06-09 00:09:39.59 spid26s Recovery of database ‘UserDBName’ (7) is 44% complete (approximately 76 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
2017-06-09 00:09:51.38 spid26s Recovery of database ‘UserDBName’ (7) is 54% complete (approximately 62 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
2017-06-09 00:09:51.40 spid26s Recovery of database ‘UserDBName’ (7) is 54% complete (approximately 62 seconds remain). Phase 3 of 3. This is an informational message only. No user action is required.
DBSTARTUP (UserDBName, 7): FCBOpenTime took 164 ms
DBSTARTUP (UserDBName, 7): FCBHeaderReadTime took 135 ms
DBSTARTUP (UserDBName, 7): FileMgrPreRecoveryTime took 277 ms
DBSTARTUP (UserDBName, 7): MasterFilesScanTime took 144 ms
DBSTARTUP (UserDBName, 7): AnalysisRecTime took 1470 ms
DBSTARTUP (UserDBName, 7): RedoRecTime took 71938 ms
DBSTARTUP (UserDBName, 7): UndoRecTime took 4903 ms
DBSTARTUP (UserDBName, 7): PhysicalRecoveryTime took 73408 ms
DBSTARTUP (UserDBName, 7): PhysicalCompletionTime took 4913 ms
DBSTARTUP (UserDBName, 7): RecoveryCompletionTime took 102 ms
DBSTARTUP (UserDBName, 7): StartupInDatabaseTime took 136 ms
DBSTARTUP (UserDBName, 7): RemapSysfiles1Time took 125 ms
2017-06-09 00:10:11.44 spid6s Recovery of database ‘UserDBName’ (7) is 63% complete (approximately 55 seconds remain). Phase 3 of 3. This is an informational message only. No user action is required.
2017-06-09 00:10:31.48 spid6s Recovery of database ‘UserDBName’ (7) is 63% complete (approximately 65 seconds remain). Phase 3 of 3. This is an informational message only. No user action is required.
2017-06-09 00:10:34.61 spid33s [INFO] HkRedoCloseLastOpenRangeSegment(): Database ID: [5]. Log recovery open segment scan from 00000495:00005D20:006B to 000004C7:00010E78:002F.
2017-06-09 00:10:34.67 spid25s [INFO] redoOpenRangeSegment(): Database ID: [5]. Log recovery open segment scan completed at 000004C7:00010E78:002F.
2017-06-09 00:10:34.67 spid25s [INFO] HkPrintUndoRowStats(): Database ID: [5]. Undo Rows Stats. [UndoRowsSeen] = 0, [UndoRowsMatched] = 0, [InsertRowsMatched] = 0, [InsertRowsSeen] = 0, [UndoRowsAborted] = 0
DBSTARTUP (UserDBName, 5): FCBOpenTime took 202 ms
DBSTARTUP (UserDBName, 5): FCBHeaderReadTime took 133 ms
DBSTARTUP (UserDBName, 5): FileMgrPreRecoveryTime took 308 ms
DBSTARTUP (UserDBName, 5): MasterFilesScanTime took 158 ms
DBSTARTUP (UserDBName, 5): StreamFileMgrPreRecoveryTime took 141 ms
DBSTARTUP (UserDBName, 5): LogMgrPreRecoveryTime took 478 ms
DBSTARTUP (UserDBName, 5): PhysicalCompletionTime took 116181 ms
DBSTARTUP (UserDBName, 5): HekatonRecoveryTime took 116167 ms
2017-06-09 00:10:34.84 spid24s Recovery completed for database UserDBName (database ID 5) in 117 second(s) (analysis 12 ms, redo 0 ms, undo 50 ms.) This is an informational message only. No user action is required.
2017-06-09 00:10:51.48 spid6s Recovery of database ‘UserDBName’ (7) is 71% complete (approximately 53 seconds remain). Phase 3 of 3. This is an informational message only. No user action is required.
2017-06-09 00:11:11.54 spid6s Recovery of database ‘UserDBName’ (7) is 99% complete (approximately 1 seconds remain). Phase 3 of 3. This is an informational message only. No user action is required.
2017-06-09 00:11:11.54 spid6s 23 transactions rolled back in database ‘UserDBName’ (7:0). This is an informational message only. No user action is required.
2017-06-09 00:11:11.55 spid6s Recovery is writing a checkpoint in database ‘UserDBName’ (7). This is an informational message only. No user action is required.
2017-06-09 00:11:11.55 spid6s Recovery completed for database UserDBName (database ID 7) in 154 second(s) (analysis 1405 ms, redo 71933 ms, undo 80147 ms.) This is an informational message only. No user action is required.
2017-06-09 00:11:11.56 spid6s Parallel redo is shutdown for database ‘UserDBName’ with worker pool size [2].
2017-06-09 00:11:11.57 spid6s Recovery is complete. This is an informational message only. No user action is required.

Errors while reading log file
EXEC sp_readerrorlog

Msg 22004, Level 16, State 1, Line 0
The log file is not using Unicode format.
===================================
The log file is not using Unicode format. (.Net SqlClient Data Provider)
——————————
For help, click: http://go.microsoft.com/fwlink?ProdName=Microsoft%20SQL%20Server&ProdVer=14.00.0600&EvtSrc=MSSQLServer&EvtID=22004&LinkId=20476
——————————
Server Name: 52.42.36.22
Error Number: 22004
Severity: 16
State: 1

Running out of Space

Msg 3202, Level 16, State 1, Line 5 Write on “/var/opt/mssql/data/HWageInfo_06132017.bak” failed: Insufficient bytes transferred. Common causes are backup configuration, insufficient disk space, or other problems with the storage subsystem such as corruption or hardware failure. Check errorlogs/application-logs for detailed messages and correct error conditions. Msg 3013, Level 16, State 1, Line 5 BACKUP DATABASE is terminating abnormally. Ensure you have enough disk space. Ensure you have enough disk space.

References:

https://docs.microsoft.com/en-us/sql/linux/sql-server-linux-setup-docker
https://github.com/Microsoft/mssql-docker/issues/55
https://docs.microsoft.com/en-us/sql/linux/sql-server-linux-troubleshooting-guide

Business Needs:

Our goal is to identify whether Amazon SQL Server RDS Service provides elastic , highly available , Scalable and operationally efficient solution for our use case. We are evaluating options to migrate our read/write heavy production SQL Server database to amazon SQL Server RDS. We have pretty high throughput needs for few hours a day for few months in a year ,which is mission critical for our business success. Any downtown during peak usuage would be catastrophic for our business. We are evaluating pros and cons of moving to amazon RDS with provisioned IOPS.

Caveats of AWS RDS SQL Server:

These information we gathered while working with SQL Server RDS.

Feature NameYes/NoDescription
SQL Server 2016 SupportNoMicrosoft says SQL Server 2016 comes with very rich feature set and ton on OLTP Enhancements
Native Backup RestoreYesAWS RDS Released this feature a week ago , which makes moving databases across environments lot more easier.
Elastic IOPSNoStorage and IOPS needs to be incremented linearly for higher performance. You can’t get higher IOPS without increasing storage.
Elastic StorageNoScaling Storage is not an option after launching an instance
RAID SupportNoWe usually have RAID 10 for production workload and RDS doesn’t have options to configure RAID
Point in Time Restore on Same InstanceNoYou can’t do Point in Time Restore on the existing database. You have to spin up new Instance
AlwaysOn Availability GroupsNoThis provides ability to failover group of databases to your secondary instance
MirroringYesMirroring is Deprecated feature and its replaced with AlwaysOn Availability Groups
Linked Servers from RDSNoBut Linked Servers to RDS is Allowed.
Service BrokerNoComes handy for services

  No Admin privileges. You can’t execute normal sql server system stored procedures and you need to work with options groupand parameters group to modify configuration. You can’t execute sp_configure to change configurations.

Remote Host Identification SSH Error
 

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.
The fingerprint for the RSA key sent by the remote host is
SHA256:oFgx2U4Q0KUxtFnouxHYJdLyH6qDbX/rKtsg
Please contact your system administrator.
Add correct host key in /Users/username/.ssh/known_hosts to get rid of this message.
Offending RSA key in /Users/ username /.ssh/known_hosts:1
RSA host key for 54.214.1.26 has changed and you have requested strict checking.
Host key verification failed.
Edit known_hosts and remove the offending host entry and SSH into Host Again
 

username -MBP:.ssh username $ vi known_hosts
username -MBP:.ssh username $ ssh -i Key.pem [email protected]
The authenticity of host ‘54.214.1.26 (54.214.1.26)’ can’t be established.
ECDSA key fingerprint is SHA256:2VcR+DiKNRwyQwZ2LqtZ6EHxQYv5MWMAsrrI.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added ‘54.214.1.26’ (ECDSA) to the list of known hosts.
Last login: Fri Jul 29 23:08:35 2016 from
__| __|_ )
_| ( / Amazon Linux AMI
___|\___|___|
https://aws.amazon.com/amazon-linux-ami/2016.03-release-notes/
4 package(s) needed for security, out of 4 available
Run “sudo yum update” to apply all updates.

CloudIQ is a leading Cloud Consulting and Solutions firm that helps businesses solve today’s problems and plan the enterprise of tomorrow by integrating intelligent cloud solutions. We help you leverage the technologies that make your people more productive, your infrastructure more intelligent, and your business more profitable. 

US

3520 NE Harrison Drive, Issaquah, WA, 98029

INDIA

Chennai One IT SEZ,

Module No:5-C, Phase ll, 2nd Floor, North Block, Pallavaram-Thoraipakkam 200 ft road, Thoraipakkam, Chennai – 600097


© 2023 CloudIQ Technologies. All rights reserved.

Get in touch

Please contact us using the form below

    USA

    3520 NE Harrison Drive, Issaquah, WA, 98029

    +1 (206) 203-4151

    INDIA

    Chennai One IT SEZ,

    Module No:5-C, Phase ll, 2nd Floor, North Block, Pallavaram-Thoraipakkam 200 ft road, Thoraipakkam, Chennai – 600097

    +91-044-43548317