In this blog, we’ll explore how to create a web application using Replit AI Agent, an exciting tool that can turn your ideas into working applications with just the help of clear prompts. Wondering what a Replit Agent is and how the whole process works? Let’s break it down! 

What is Replit Agent? 

The Replit Agent is an AI-powered development assistant that can build complete applications based on simple, well-written prompts from users. It supports multiple programming languages, but by default, the Replit Agent is most comfortable using: 

  • React for front-end development.
  • Express.js for the backend.
  • PostgreSQL for the database. 

With just a clear set of instructions, the Replit Agent can set up both the front-end and back-end, creating a solid foundation for modern web applications. 

What Does It Take to Build an Application Using Replit Agent? 

At the heart of any application built with Replit Agent lies one key element: the prompt

A good prompt is like a blueprint. It should clearly define the features you want, along with any do’s and don’ts, so the AI can understand your exact requirements. 

A few golden rules for writing effective prompts: 

  • Avoid overloading your prompt with too many instructions at once. A crowded or overly complex prompt can cause the Agent to stray from your intended outcome. 
  • Break your project down into smaller parts and write feature-specific prompts. This helps the Agent focus on one piece at a time, reducing confusion and improving accuracy. 
  • Even though the Replit Agent is highly capable, you should always run tests once the feature is generated. Minor bugs or differences from the expected behavior can occur. 
  • If you find any bugs or mismatches, you can create a fixing prompt that clearly explains the difference between the actual and expected behavior, attaching a screenshot if possible, can make it even clearer for the Agent. 

For this Timesheet Tracker project, simple Word documents were used to write out each prompt in the same clear structure, adding screenshots whenever available, to guide the Agent. 

A Quick Note on Tech Stack 

By default, Replit Agent uses React with Tailwind CSS for the front-end, and an in-memory database for storing data during development. However, for real-world applications, you’ll likely need a proper database setup. 

To do that, all you need to do is mention it in your prompt. The Agent will automatically: 

  • Set up a serverless PostgreSQL database, 
  • Create the necessary scripts and schemas, 
  • Push the schema to the database, 
  • Migrate your data handling from in-memory storage to PostgreSQL. 

You can even query and view your tables directly in the serverless database environment, making it easy to validate and manage your application data. 

Timesheet Tracker Developments using Replit

Prompt 1: The Initial Requirements and Features

When it comes to building an application using Replit AI Agent, everything starts with the right prompt. Think of the prompt as your project’s blueprint, it tells the Agent exactly what you want to build, how it should behave, and what features you expect. 

A clear and structured prompt will make your development smoother and help the AI Agent understand your goals without confusion. 

Here’s what you should keep in mind while writing a prompt, using the Timesheet Tracker example: 

  • Clarity is key 
    The prompt should explain the requirements in a straightforward and easy-to-understand way. Each feature, like user roles, timesheet entry, approval flow, and reporting, should be broken down clearly so the Agent knows exactly what to build. 
  • Structure your requirements 
    Just like the Timesheet Tracker prompt, group your features logically. For example, User Management, Timesheet Entry, Leave Management, Reporting, and so on. This helps both you and the AI stay organized. 
  • Mention region-specific needs 
    If your app has regional differences (like the USA vs. India workweeks or holidays), specify those in the prompt so the Agent can handle them correctly. 
  • Role-Based Access is important 
    Define different user roles clearly (Admin, Manager, Employee, HR/Finance) and mention what each role can and cannot do. This avoids confusion later. 
  • Don’t forget reports and exports 
    If you need specific reports or formats (Excel, PDF, date ranges, project-wise, etc.) — mention them upfront. This will help the Agent include these features from the start. 
  • User-friendly notifications 
    Features like daily or weekly reminders and approval alerts should be included if you want smooth user interactions. 
  • Real-world workflows matter 
    Including approval steps, comments, and reminders ensures your app feels natural and helpful for real users, not just technically functional. 

A Few Extra Tips for Writing a Good Prompt: 

  • Keep it detailed but easy to follow, avoid overloading the prompt with too much in one go. 
  • When possible, split large features into separate prompts, this keeps the AI Agent focused. 
  • Include expected behaviors, user scenarios, or screenshots to give clear context. 
  • Be specific about tools and tech stack, like telling the Agent to switch from in-memory to Postgres. 
  • Always test the results, even when the Agent completes the build — minor bugs or misinterpretations can happen, and follow-up prompts help fix them. 

A well-crafted initial prompt is the secret to making Replit Agent work for you, not against you. The more precise you are, the closer the Agent’s output will match your vision. 

Below is the screenshot of Timesheet Tracker Application Developed by Replit Agent. 

Prompt 2: Bug Identification and Fixing with Replit Agent 

Even though Replit AI Agent does an excellent job generating full-featured applications from well-structured prompts, like any development process, it’s common to encounter minor bugs or unexpected behaviors. The key is to treat these moments as part of the iterative process. 

Once the initial version of the Timesheet Tracker Application was generated, real-time testing was conducted to make sure all the core features worked as intended. During this phase, a set of common functional bugs was identified which were compiled and addressed by sending the Agent new targeted prompts, clearly mentioning: 

  • The Current Behavior (what’s happening).
  • The Expected Behavior (what should happen).
  • Screenshots (if available) for extra context. 

Here’s a quick snapshot of the Bug List 1 encountered: 

  • Bug 1: Timesheet Approval/Rejection Error 
    When a Manager clicks “Confirm Approval/Rejection,” an error occurs — 
    value.toISOString is not a function. 
    → The timesheet should approve successfully and show in the approved list. 
  • Bug 2: Adding Team Members to a Project 
    Adding team members accidentally adds the logged-in user instead of the selected one and even creates duplicates. 
    → It should correctly assign only the chosen team member to the project. 
  • Bug 3: Timesheet Daily/Weekly Toggle Not Working 
    The toggle between Daily and Weekly views has no effect on the display. 
    → Switching the view should update the timesheet display accordingly. 
  • Bug 4: Missing Auto-Refresh in Tables 
    After creating a new entry (timesheet, project, task, user, holiday), the new record doesn’t appear unless the page is manually refreshed. 
    → Newly added records should appear automatically in the table. 
  • Bug 5: Incorrect Approval Count Displayed 
    The side menu always shows “3” for pending approvals, even if the actual number is different. 
    → The count should reflect the real number of pending approval requests. 
  • Bug 6: View Full Calendar Not Working 
    Clicking “View Full Calendar” results in no action and no calendar is displayed. 
    → The app should show a full calendar with region-specific holidays configured by the Admin. 

Once these bugs were documented, Fixing prompts were used to guide the Agent with clear instructions on what went wrong and what the expected behavior should be. Adding screenshots wherever possible helped the Agent better understand UI-related issues, too. 

This approach — testing, identifying, and fixing bugs through follow-up prompts, highlights the importance of combining human testing with AI-generated builds. While the Replit Agent can save a huge amount of development time, it still benefits from human eyes to ensure everything works just the way you intended. 

Checklist – Key Steps to Build with Replit Agent

  • Write clear, structured prompts. 
  • Split complex features into separate prompts. 
  • Test every feature thoroughly. 
  • Use fixing prompts to address bugs. 
  • Validate and deploy. 

Conclusion 

Building applications with the Replit AI Agent opens up an entirely new way of approaching software development. One where clear communication, through well-structured prompts, becomes just as important as coding itself. 

Throughout this Timesheet Tracker project, you’ve seen how an idea can turn into a working application simply by guiding the Replit Agent with clear instructions and thoughtful iteration. From defining your core functional requirements to identifying and fixing real-world bugs, the development process with AI still mirrors traditional software development in many ways, especially when it comes to testing and refining. 

The biggest takeaway? 

AI like Replit Agent isn’t here to replace developers, it’s here to accelerate your journey from concept to completion. The quality of the output depends entirely on the clarity of your inputs. If you learn to write structured, specific, and scenario-based prompts, you can dramatically reduce development time while maintaining the flexibility to adjust and correct issues as they arise.

So, whether you’re an experienced developer or someone experimenting with AI-powered app building for the first time, Replit Agent is a valuable companion that can help transform ideas into real, testable, and deployable applications, faster than ever. 

In the rapidly evolving business landscape, the competition for top talent has never been fiercer. Traditional hiring methods, such as manual resume screening, repetitive administrative tasks, and multiple rounds of interviews, are increasingly seen as inefficient and outdated.

Enter AI-based talent recruitment and HR automation technologies that are reshaping how organizations attract, engage, and hire talent. These smart solutions enable companies to streamline recruitment, make data-driven decisions, and create fairer and more inclusive workplaces.

How AI is Transforming Talent Recruitment

Artificial Intelligence (AI) is actively reinventing the recruitment process. Here’s how AI enhances talent acquisition:

1. Resume Screening at Scale

AI-powered platforms can analyze thousands of resumes in minutes. They automatically filter candidates based on job descriptions, skills, and experience, significantly cutting down the screening time, sometimes by up to 75%.

2. Predictive Hiring Analytics

Historical hiring data fuels predictive models that help identify candidates most likely to succeed in specific roles. Organizations can make smarter, more informed hiring decisions using these insights.

3. Bias Reduction

Properly designed AI minimizes unconscious bias by focusing on objective qualifications rather than subjective factors, supporting diversity and inclusion efforts.

4. Chatbots for Candidate Engagement

AI-driven chatbots interact with candidates 24/7, answering questions, guiding them through applications, and scheduling interviews, greatly improving the overall candidate experience.

Key Benefits of AI in HR and Recruitment

The integration of AI into HR processes provides numerous advantages:

1. Speed and Efficiency

AI automates repetitive tasks like candidate screening and interview scheduling, allowing HR professionals to focus on strategic initiatives.

2. Improved Candidate Quality

AI matches candidates not only to job descriptions but also considers cultural fit and future career paths, improving retention and performance.

3. Cost Reduction

Organizations save costs by minimizing dependency on agencies and reducing time-to-hire.

4. Enhanced Candidate Experience

Faster application processes, timely feedback, and personalized interactions enhance employer branding.

5. Data-Driven Decision Making

AI-driven insights into sourcing channels, hiring timelines, and employee turnover empower HR leaders to optimize recruitment strategies.

AI applications in HR include:

  • Resume screening and ranking.
  • Passive candidate sourcing.
  • Interview scheduling automation.
  • Streamlined employee onboarding.
  • Predicting employee performance.
  • Identifying attrition risks.

Real-time Examples: Naukri and LinkedIn

Naukri AI Integration

Naukri has started integrating AI mainly for resume screening, job matching, and candidate recommendations. Their AI models help recruiters find better fits faster by analyzing profiles, predicting candidate suitability, and even automating follow-ups. They’re also using AI chatbots to assist users during job applications.

AI Integration in Naukri Portal
LinkedIn AI Integration

LinkedIn is heavily into AI; they use it for personalized job recommendations, skill assessments, profile optimization tips, and content suggestions. Their AI also powers features like Recruiter AI (which helps companies find candidates smarter) and Career Explorer (which suggests career paths based on your skills).

AI Integration in LinkedIn Portal

Challenges and Considerations

Despite its benefits, AI in HR presents challenges:

  • Bias in algorithms due to flawed training data.
  • Privacy and data security concerns.
  • Loss of human touch in recruitment.
  • Transparency and fairness of AI decisions.

How to Successfully Implement AI in HR

To implement AI successfully in HR, consider the following steps:

1. Define Clear Goals

Identify pain points and set specific, measurable objectives.

2. Choose the Right Tools

Select AI tools that integrate with existing HR systems, are user-friendly, secure, and proven effective.

3. Monitor and Train Regularly

Continuously evaluate AI system performance and retrain models with fresh data to ensure fairness.

4. Keep the Human in the Loop

Maintain human oversight for final hiring decisions to preserve empathy and judgment.

5. Educate Your Team

Provide training on AI usage, ethics, and responsible application of AI-driven insights.

Emerging trends include:

  • AI-Generated Personalized Learning Paths.
  • Emotional AI measuring soft skills.
  • Blockchain for secure credentials.
  • Hyper-personalized employee experiences.

Conclusion

AI-based recruitment and HR automation are not futuristic concepts — they are today’s strategic advantages. Organizations that harness these technologies thoughtfully will shape a more agile, inclusive, and successful workforce for the future.

In today’s fast-evolving digital landscape, creating seamless and user-friendly experiences is at the heart of product and service design. To achieve this, understanding and optimizing user experience (UX) is crucial. Traditionally, UX testing and feedback analysis have been manual, time-consuming processes that rely heavily on human interpretation. However, the integration of artificial intelligence (AI) is reshaping the UX landscape, making these processes faster, more efficient, and increasingly insightful. Here’s how:

1. Automating UX Testing

AI-powered tools are now capable of automating the majority of UX testing processes. These tools can simulate user interactions, identify usability issues, and provide actionable insights. For example:

  • Heatmaps and Click Tracking: AI analyzes where users click or hover the most, predicting behavior and identifying navigation bottlenecks.
  • A/B Testing at Scale: AI can run simultaneous A/B tests, optimizing different designs or layouts to determine the best-performing option in real time.

2. AI-Driven Session Replay Analysis

Instead of manually watching hours of screen recordings, AI can cluster similar sessions and detect anomalies. For example, if users often abandon a checkout page, AI can spot the pattern and even identify the potential cause.

3. User Behavior Predictions

Machine learning models learn from past user behaviors to predict future trends. This enables designers to anticipate user needs and proactively refine UX elements. For instance:

  • Predicting which features users might find most useful.
  • Identifying potential churn risks based on interaction patterns.

4. Accessibility Improvements

AI is also playing a pivotal role in making digital products accessible to all users, including those with disabilities. For instance:

  • Automated suggestions for color contrast improvements.
  • Voice-to-text and text-to-voice functionalities for inclusivity.

5. Real-time Feedback Loop

One of AI’s most significant contributions to UX is enabling real-time feedback loops. Chatbots and AI assistants embedded in apps or websites collect feedback on-the-go, identifying and addressing issues instantaneously.

6. Reducing Human Bias

Human bias can often affect the interpretation of UX data. AI, when designed carefully, provides an objective analysis, ensuring that decisions are data-driven and user-centric.

Smarter Feedback Analysis with AI

User feedback from app store reviews to survey responses is a goldmine of insights. But when you’re dealing with thousands of comments, it’s nearly impossible to analyze them manually. AI makes feedback analysis faster and smarter.

1. Sentiment Analysis

AI uses Natural Language Processing (NLP) to analyze sentiment in user reviews and support tickets. It can:

  • Categorize feedback as positive, negative, or neutral.
  • Identify the emotional tone (frustration, delight, confusion).
  • Track sentiment trends over time.

This helps teams quickly gauge user satisfaction and identify pain points.

2. Feedback Clustering & Theme Detection

AI can automatically cluster similar feedback and extract common themes like “slow load time,” “confusing onboarding,” or “excellent support.” This eliminates the need for manual tagging and prioritizes issues based on frequency and severity.

3. Voice of Customer (VoC) Insights

Combining sentiment, intent, and behavior analysis, AI delivers deeper Voice of Customer insights. It can even flag feedback from high-value users or influencers, helping teams prioritize responses.

Tools and Technologies Leading the Charge

Some popular AI-driven tools that are transforming UX testing and feedback analysis include:

  • Hotjar AI – Smart analysis of heatmaps and user sessions.
  • Qualtrics XM – Predictive models for customer experience.
  • Monkey Learn – AI for text analysis and feedback tagging.
  • User Testing with AI – AI-driven insights from user test videos.
  • Chatter mill – Unified AI platform for customer feedback analysis.

Benefits of AI in UX Testing and Feedback

  • Speed: Automate testing and analysis that would take weeks manually.
  • Scalability: Analyze millions of user sessions or reviews effortlessly.
  • Accuracy: Minimize human bias and error in interpreting feedback.
  • Proactive Fixes: Predict UX issues before they escalate.
  • User-Centric Design: Make data-driven improvements based on real behavior and sentiment.

Conclusion

By automating tedious tasks, uncovering hidden insights, and predicting user behavior, AI helps teams build better products, faster. As digital experiences become more complex, AI will be essential in crafting seamless, user-friendly interfaces that delight and retain users.

So, if you’re in UX design, product management, or QA, it’s time to embrace the AI wave and reimagine how user experience can be understood and improved.

As part of a modernization initiative, Amazon Q Developer, a next-generation developer assistant, has actively been leveraged to accelerate the migration of legacy .NET Framework applications to the latest .NET platforms, including .NET 6, .NET 7, and .NET 8. Amazon Q Developer is designed to support a broad spectrum of application types, making it a highly effective solution for transforming large-scale enterprise portfolios that span multiple layers of architecture and technology stacks.

Understanding Legacy .NET Framework Applications

Legacy .NET Framework applications are those built on older versions of the .NET Framework, such as .NET Framework 3.5 and legacy .NET Core versions. These applications often face limitations in terms of performance, security, and scalability, making modernization essential for maintaining operational efficiency and agility.

Benefits of Migrating to Modern .NET Platforms

Migrating to modern .NET platforms, such as .NET 8, offers several advantages:

  • Improved Performance: Modern .NET platforms provide better performance and resource management.
  • Cross-Platform Support: Applications can run on various operating systems, including Windows, Linux, and macOS.
  • Enhanced Security: Modern .NET platforms offer improved security features and long-term support (LTS).
  • Operational Efficiency: Modernization leads to operational efficiency and agility, enabling businesses to adapt to changing market demands.

Amazon Q Developer

Amazon Q Developer is a powerful tool designed to support the migration of legacy .NET Framework applications to modern .NET platforms. It supports a wide range of source project types, ensuring compatibility with even significantly aged applications. The tool provides a safe and structured code transformation process, pushing the transformed code into a separate branch to maintain the integrity of the original repository.

Migration Process

The migration process using Amazon Q Developer involves several steps:

  1. Assessing the Legacy Application: Identify the project types and assess the compatibility with Amazon Q Developer.
  2. Setting Up Amazon Q Developer: Configure the migration settings and prepare the tool for transformation.
  3. Performing Code Transformation: Execute the code transformation and push the transformed code into a separate branch.
Transformation Steps in IDE:

1. Open any C# based solution or project in Visual Studio that you want to transform.

2. Open any C# code file in the editor.

3. Choose Solution Explorer.

4. From the Solution Explorer, right click a solution or project you want to transform, and then choose Port with Amazon Q Developer.

5. The Port with Amazon Q Developer window appears.

The solution or project you selected will be chosen in the Choose a solution or project to transform dropdown menu. You can expand the menu to choose a different solution or project to transform.

In the Choose a .NET target dropdown menu, choose the .NET version you want to upgrade to.

6. Choose Confirm to begin the transformation.

7. Amazon Q begins transforming your code. You can view the transformation plan it generates for details about how it will transform your application.

A Transformation Hub opens where you can monitor progress for the duration of the transformation. After Amazon Q has completed the Awaiting job transformation startup step, you can navigate away from the project or solution for the duration of the transformation.

8. After the transformation is complete, navigate to the Transformation Hub and choose View diffs to review the proposed changes from Amazon Q in a diff view.

9. Choose View code transformation summary for details about the changes Amazon Q made. You can also download the transformation summary by choosing Download summary as .md.

If any of the items in the Code groups table require input under the Linux porting status, you must manually update some files to run your application on Linux.

  1. From the Actions dropdown menu, choose Download Linux readiness report.
  2. A .csv file opens with any changes to your project or solution that you must complete before your application is Linux compatible. It includes the project and file that need to be updated, a description of the item to be updated, and an explanation of the issue. Use the Recommendation column for ideas on how to address a Linux readiness issue.
  3. To update your files in place, choose Accept changes from the Actions dropdown menu.
  1. Reviewing and Testing: Review the transformed code, test its functionality, and make necessary adjustments.

Key Features and Capabilities

Amazon Q Developer offers several key features that make the migration process efficient:

  • Support for Legacy Project Types: Compatible with a wide range of source project types, including Web APIs, Libraries, Unit Tests, WCF, MVC and SPA.
  • Safe and Structured Code Transformation: Ensures the original repository remains untouched by pushing the transformed code into a separate branch.
  • Bulk transformation: Amazon Q Developer Web experience provides the ability to transform multiple projects in a single job from external repositories like GitHub and GitLab, without affecting the original source code. The transformation occurs in a newly created branch, ensuring the safety of the original source code.
  • Developer-Friendly Review and Control: Supports bulk transformation, generates necessary .NET 8 files, and integrates with IDEs like Visual Studio and VS Code for developer control.

Transformation Use Cases

Amazon Q Developer supports the transformation of various legacy application types, such as:

  • Web API Projects: Full transformation to .NET 8 with clean, minimal, and production-ready code. Some manual updates required for ApiController attributes and route mappings.
  • MVC Applications: Smooth migration of controllers and back-end logic layers, with manual adjustments needed for the UI layer and Razor views.
  • Class Libraries: Seamless migration with accurate updates to project format and references.
  • Unit Test Projects: Successful transformation to .NET 8 with validated test execution.

Key Findings and Observations

During the modernization process, we observed the following:

  • Web API Projects: Clean and minimal code with some manual updates required.
  • MVC Applications: Smooth migration with manual adjustments needed for the UI layer.
  • Class Libraries: Accurate updates to project format and references.
  • Unit Test Projects: Validated test execution with minimal alteration.
  • Projects with External DLL Dependencies: In certain scenarios, a new class is created that references external DLLs or private NuGet packages. It is essential to carefully review and apply changes to ensure compatibility.
  • HTTP Attribute Mismatch: REST API best practices for HTTP attributes are applied during transformation, but the original source may not have followed these practices. This can lead to potential compatibility issues for API consumers.       

Best Practices for a Successful Migration

To ensure a smooth and successful migration process, consider the following best practices:

  • Thorough Assessment: Assess the legacy application and identify compatibility with Amazon Q Developer.
  • Configuration and Setup: Properly configure the migration settings and prepare the tool for transformation.
  • Review and Testing: Review the transformed code, test its functionality, and make necessary adjustments.
  • Manual Adjustments: Be prepared to make manual adjustments for specific project types and dependencies.

Conclusion

Migrating legacy .NET Framework applications to modern .NET platforms using Amazon Q Developer offers several benefits, including improved performance, cross-platform support, enhanced security, and operational efficiency. The tool provides a safe and structured code transformation process, ensuring compatibility with even significantly aged applications.

Front-end development often focuses on design systems, data binding, and UI responsiveness. But with artificial intelligence (AI) becoming increasingly accessible, it’s time to explore how AI-powered automation can enhance Angular applications.

Automation is no longer confined to the backend. With modern APIs and cloud-based intelligence, Angular developers can now create front-end applications that predict, personalize, and even automate user interactions. This blog explores how AI and automation can be integrated into Angular apps to deliver smarter, more intuitive user experiences.

The Shift: From Interactive to Intelligent

Traditional web apps rely on user input and manual data. Smart apps anticipate needs, adapt to context, and automate tasks. AI bridges that gap.

In Angular apps, this means:

  • Predictive suggestions (e.g., auto-completed forms based on input context)
  • Automated workflows (e.g., booking systems that fill details intelligently)
  • Conversational UIs (e.g., chatbots for onboarding or support)
  • Smart content filtering (e.g., displaying content based on sentiment or preferences)

Use Case: Building a Smart Symptom Checker Form in Angular

Scenario:
A healthcare web app requires users to describe symptoms in order to book an appointment. Instead of selecting a specialty and time manually, this process can be automated based on the user’s input.

Solution:
We can use the OpenAI GPT API to classify the symptoms and automatically suggest:

  • Relevant medical specialty
  • Available doctors
  • Suggested appointment time slots

Workflow:
1.  The user enters symptoms in a text box.
2. The input is sent to OpenAI via an Angular service.
3. The API returns a structured interpretation.
4. Form fields are auto-populated with suggestions.

Code Snippet: Angular + OpenAI Integration

@Injectable({ providedIn: 'root' })
export class AIService {
  constructor(private http: HttpClient) {}

  getDiagnosis(input: string): Observable<any> {
    const headers = new HttpHeaders({
      'Authorization': `Bearer YOUR_OPENAI_API_KEY`,
      'Content-Type': 'application/json'
    });

    const body = {
      model: 'gpt-4',
      messages: [{ role: 'user', content: `Suggest a medical specialty for: ${input}` }]
    };

    return this.http.post('https://api.openai.com/v1/chat/completions', body, { headers });
  }
}

Tools to Boost Automation in Angular

Here’s a quick list of AI and automation tools Angular developers can leverage:

ToolUse Case
OpenAI APINatural language understanding, chatbots
TensorFlow.jsOn-device AI predictions
LangChainAgentic workflows & intelligent chaining
Google Cloud AI APIsVision, speech, NLP services
Azure Cognitive ServicesAI APIs with Angular-friendly SDKs

Other Real-World AI Automation Ideas for Angular Devs

  • Smart HR Portal: Auto-summarize resumes using AI and recommend roles.
  • IT Helpdesk Assistant: Triage tickets based on urgency/sentiment.
  • AI-Powered Dashboard: Show real-time insights and alerts based on historical data.
  • Voice-Controlled Interfaces: Combine Web Speech API + Angular to trigger app actions.
  • Personalized Content Feed: Filter content dynamically using user sentiment or interests.

The Future: Angular Meets AI-First Development

With Angular’s evolving architecture and growing support for reactive paradigms (like Signals), it’s becoming easier to integrate real-time data and reactive AI behavior.

As AI becomes mainstream in SaaS products, developers who know how to blend automation and intelligence into their apps will lead the next wave of innovation. Whether it’s through smart forms, personalized dashboards, or conversational UIs, Angular developers have all the tools they need to make the leap.

Conclusion

The future of the front-end isn’t just responsive; it’s intelligent. By integrating AI-powered automation into Angular apps, developers can craft user experiences that are faster, smarter, and more human-centric. No deep ML expertise is required to begin. Start small, experiment with APIs, and gradually build forward-thinking features. Angular is ready for the AI-first era.

💡Tip: Start by automating one small user pain point. Let AI handle the rest.


Product recommendations have come a long way from static, one-size-fits-all suggestions to dynamic, AI-driven personalization. In the early days, businesses used manual curation or simple algorithms that grouped users based on shared behaviors. However, with the rise of big data and machine learning, recommendations have become smarter, faster, and more relevant.

Today, AI-powered engines analyze browsing history, purchase patterns, and even real-time interactions to predict what users want before they even search for it. From e-commerce and streaming to finance and healthcare, personalized recommendations have transformed how businesses engage customers, making interactions seamless, intuitive, and highly effective.

Logic behind AI recommendations

AI-powered engines analyze vast amounts of data to predict what customers may want based on past behaviors, preferences, and trends. Here’s how they work:

How AI Understands Customer Data

AI systems use sophisticated methods to process and interpret multiple data points, delivering personalized suggestions:

Different Types of AI-Powered Recommendation Systems

Collaborative Filtering

Collaborative filtering operates on the principle that users with similar behaviors will have similar preferences. It analyzes past behaviors and interactions of different users to suggest items that others with comparable interests have liked. The system can be either user-based or item-based, depending on the focus of the algorithm.

Content-Based Filtering

This approach recommends products based on the attributes of items a user has previously engaged with. For instance, if a customer frequently purchases sports shoes, the system will suggest other products with similar characteristics, such as running gear or fitness trackers.

Hybrid Recommendation Systems

A combination of collaborative and content-based filtering, hybrid models offer more accurate suggestions by leveraging the strengths of both methods. Netflix, for example, uses a hybrid approach, analyzing viewing history, genres, and ratings to recommend content tailored to each user.

Latest Advancements in AI Recommendations

  • Generative AI for Personalized Shopping: AI is now capable of generating hyper-personalized shopping experiences by understanding consumer intent through natural language processing (NLP).
  • Large Language Models (LLMs): Retailers are integrating LLMs like GPT to enhance customer interactions, making product discovery more engaging.
  • AI Chatbots & Voice Assistants: These tools assist shoppers in finding products by answering queries, making recommendations, and even processing transactions in real-time.
  • Computer Vision: AI-powered image recognition allows users to upload pictures to find similar products, enhancing convenience in online shopping.
  • Context-Aware Recommendations: AI is evolving to analyze external factors such as location, time of day, and weather to refine its suggestions.

Why AI Recommendations Matter to Consumers and Businesses

AI-powered recommendations bridge the gap between businesses and consumers, creating a win-win situation. They enhance user experiences by delivering relevant suggestions while helping businesses drive engagement, sales, and loyalty.

Benefits for Consumers
  • Personalized Shopping Experience: AI ensures that consumers receive product recommendations tailored to their preferences, saving them time and effort.
  • Enhanced Convenience: AI-powered recommendations allow users to quickly find relevant products, making online shopping smoother and more efficient.
  • Improved Decision-Making: By analyzing trends and previous interactions, AI helps users discover products they may not have otherwise considered.
  • Higher Satisfaction: AI-driven recommendations often lead to more satisfying purchases as they align closely with the customer’s needs and preferences.
  • Better Content Discovery: On streaming platforms and e-books, AI assists in surfacing content that matches a user’s viewing or reading habits, enhancing entertainment experiences.
Benefits for Businesses
  • Increased Customer Engagement: AI-driven recommendations keep customers engaged by showing them relevant products, leading to higher interaction rates.
  • Boost in Sales & Revenue: Businesses using AI recommendation engines have reported up to a 50% increase in revenue (Source: Harvard Business Review, 2023).
  • Higher Conversion Rates: AI-powered product suggestions have been shown to increase conversion rates by up to 20% (Source: McKinsey & Company, 2022).
  • Customer Retention & Loyalty: Personalized recommendations improve customer retention by 30%, as consumers appreciate platforms that cater to their unique preferences (Source: Gartner, 2023).
  • Optimized Marketing Strategies: AI can predict which products customers are likely to buy next, allowing businesses to create targeted campaigns that maximize ROI.
  • Reduced Cart Abandonment: Personalized recommendations at checkout encourage customers to complete their purchases, minimizing lost sales.
  • Competitive Advantage: Companies leveraging AI for recommendations stay ahead by delivering superior user experiences compared to businesses relying on traditional methods.

AI-driven recommendation systems use machine learning and data analytics to provide customers with personalized product suggestions based on their immediate preferences and behaviors. This approach not only enhances customer satisfaction but also significantly impacts revenue by increasing the average order value and improving customer retention.

Statistics Highlighting the Impact

The impact of AI-powered recommendations is backed by data, proving just how essential they are for businesses and consumers alike. Here are some key statistics highlighting their effectiveness:

Market Growth: The product recommendation engine market is projected to grow from $7.42 billion in 2024 to $10.13 billion in 2025, with a compound annual growth rate (CAGR) of 36.5% (The Business Research Company).

Future Projections: By 2029, the market is expected to reach $34.77 billion, driven by increased demand for real-time and personalized shopping experiences (The Business Research Company).

Adoption Rate: Approximately 70% of companies are either implementing or developing digital transformation strategies, which include the use of recommendation engines (ZDNet, cited in Mordor Intelligence).

Impact on Sales: Product recommendations account for 35% of Amazon’s sales, highlighting their significant impact on revenue (Involve.me).

Consumer Preference: 83% of customers are willing to share their data for a more personalized shopping experience (Involve.me).

Conversion Rates: 49% of online purchases are made by consumers who did not intend to buy until they received personalized product recommendations (Digital Minds BPO).

  1. Hyper-Personalization: AI is moving towards offering deeply personalized experiences by analyzing micro-interactions and emotional responses.
  2. Integration with Augmented Reality (AR) & Virtual Reality (VR): AI-powered recommendations will merge with AR/VR to create immersive shopping experiences.
  3. Cross-Industry Adoption: Beyond retail, AI recommendations are being integrated into the healthcare, finance, and education sectors to enhance user engagement.
  4. Explainable AI (XAI): Researchers are working on making AI recommendations more transparent and interpretable to improve trust and user adoption.

Optimize Product Recommendations with CloudIQ Solutions

To maximize the potential of AI-powered recommendations, businesses need sophisticated, data-driven solutions. CloudIQ Solutions provides state-of-the-art AI recommendation engines designed to enhance personalization, increase engagement, and drive revenue growth. 

By leveraging advanced machine learning and deep learning models, CloudIQ helps businesses deliver tailored product suggestions that keep customers engaged and coming back for more. Stay ahead of the competition with CloudIQ’s intelligent recommendation technology.

Writing robust unit tests can often feel like a chore, especially when working with complex services like AI layers or cloud databases. But what if your AI-powered coding assistant could do most of the heavy lifting for you? Enter GitHub Copilot. In this blog post, we’ll explore how you can use Copilot to supercharge your .NET unit testing efforts, particularly when working with mocked services like AI layers, Cosmos DB repositories, and API integration testing. 

1. Mocking the AI Service Layer for Unit Tests 

Let’s say you’re testing a service that relies on an AI component (like sentiment analysis). Here’s how you can leverage Copilot to mock it efficiently. 

What You Need: 

  • An IAIService interface (e.g., Sentiment Analysis) 
  • A unit testing framework like xUnit, NUnit, or MSTest 
  • A mocking library such as Moq 

 
Copilot prompt:  Write a unit test that mocks IAIService and tests ReviewService.AnalyzeReviewAsync.

Example: 

[Fact] 

public async Task AnalyzeReviewAsync_ShouldReturnEnrichedReview() 

{ 

    // Arrange 

    var mockAIService = new Mock<IAIService>(); 

    mockAIService.Setup(x => x.AnalyzeSentimentAsync(It.IsAny<string>())) 

        .ReturnsAsync(new SentimentResult { Sentiment = "positive" }); 

 

    var reviewService = new ReviewService(mockAIService.Object, ...); 

 

    // Act 

    var result = await reviewService.AnalyzeReviewAsync("Great product!"); 

 

    // Assert 

    Assert.Equal("positive", result.Sentiment); 

} 

2. Testing Cosmos DB Repositories with In-Memory Emulators 

Cosmos DB can be tricky to test, but Copilot helps simplify the setup, whether you’re using the Cosmos DB Emulator or mocking the CosmosClient. 

What You Need: 

  • Cosmos DB Emulator or a mocked CosmosClient 
  • A repository pattern to abstract the data layer 
  • Test setup/teardown logic for clean test states 

Copilot prompt:  Write a unit test for ReviewRepository using a mocked CosmosClient.

Example: 

[Fact] 

public async Task AddReviewAsync_ShouldStoreReviewInDatabase() 

{ 

    // Arrange 

    var mockContainer = new Mock<Container>(); 

    var mockClient = new Mock<ICosmosDbService>(); 

    mockClient.Setup(x => x.AddReviewAsync(It.IsAny<Review>())) 

              .ReturnsAsync(new Review { Id = "123", Sentiment = "positive" }); 

 

    var repo = new ReviewRepository(mockClient.Object); 

 

    // Act 

    var result = await repo.AddReviewAsync(new Review()); 

 

    // Assert 

    Assert.Equal("123", result.Id); 

} 

This helps generate the full test structure, including initialization, mock setup, and expected assertions. 

3. API Testing with Mock Cosmos Responses + Sample AI Outputs 

For end-to-end testing, especially with complex dependency chains, Copilot makes integration testing smooth. 

What You Need: 

  • ASP.NET Core integration tests using WebApplicationFactory<> 
  • Mocked Cosmos and AI services injected via TestStartup.cs 
  • Dependency injection is managed using mock services. 

Copilot prompt:  Write an integration test that posts a review to the API and verifies the response.

[Fact] 

public async Task PostReview_ShouldReturnEnrichedReview() 

{ 

    var client = _factory.CreateClient(); 

 

    var content = new StringContent(JsonConvert.SerializeObject(new ReviewDTO 

    { 

        Text = "Amazing product!" 

    }), Encoding.UTF8, "application/json"); 

 

    var response = await client.PostAsync("/api/review", content); 

 

    response.EnsureSuccessStatusCode(); 

 

    var responseContent = await response.Content.ReadAsStringAsync(); 

    var result = JsonConvert.DeserializeObject<Review>(responseContent); 

 

    Assert.Equal("positive", result.Sentiment); 

} 

Copilot will guide you through mocking services, crafting HTTP requests, and asserting responses. 

Best Practices: Using GitHub Copilot for Unit Testing 

Here are a few pro tips to get the best out of Copilot when writing tests: 

Guideline Description 
Use natural prompts Be specific: “mock AI response”, “test repo method”, “test POST endpoint” 
Let Copilot fill your test boilerplate Start with the test method name and let Copilot generate the rest 
Iterate on Copilot’s suggestion Accept and tweak Copilot’s code, then let it improve upon your changes 
Use [Theory] and inline data Copilot understands data-driven tests—give it input/output pairs 
Use WebApplicationFactory for integration tests Helps test actual endpoints with a clean startup configuration 
Add comments to guide Copilot Inline guidance like // Mock sentiment API improves Copilot’s accuracy 

Conclusion 

With Copilot by your side, you no longer need to dread writing tests. It’s like pair programming with an assistant who knows the framework, mocking strategies, and best testing patterns. Whether you’re mocking AI services, emulating Cosmos DB, or writing full-stack integration tests, GitHub Copilot can help turn a time-consuming task into a productive part of development. 



Creating a Web API project from the ground up can often be a time-consuming process, especially when managing multiple layers such as API controllers, business logic, and data handling. With the support of GitHub Copilot, it becomes easier to leverage AI to generate and extend Web API projects efficiently. This article outlines the process of setting up a structured ASP.NET Core Web API project and using GitHub Copilot to add new endpoints quickly. 

Setting Up the Project Structure 

The process begins with creating a new ASP.NET Core Web API project organized with a modular structure for easy maintenance. The following layers are typically included: 

  • API Layer: Contains controllers that define API endpoints and handle CRUD operations for product management. 
  • Business Layer: Hosts the core business logic, including validation and product-related rules. 
  • Data Layer: Includes a class with hardcoded product data for initial testing, eliminating the need for a database during early development. 
  • Model Layer: Defines the Product model with properties such as Id, Name, Price, and Description. 

Using GitHub Copilot to Build the API 

Once the foundational structure is in place, GitHub Copilot can be used to streamline the development of project components. Instead of manually writing every part of the application, developers can guide Copilot using tailored prompts. 

For example, after setting up the Model Layer, a prompt such as: 

“Create a controller with endpoints to manage products (CRUD operations) in an API.” 

This enables Copilot to generate an API controller with Create, Read, Update, and Delete endpoints. These endpoints are integrated with the Business Layer and utilize the hardcoded data from the Data Layer. 

Extending the API with New Endpoints 

To expand the API, GitHub Copilot can be prompted while preserving the existing architecture. For instance, to add an endpoint that filters products by price, a prompt like the following can be used: 

Add an endpoint to filter products based on price greater than or less than a given amount, and update the following layers: 

  1. API Layer: Add a new endpoint for filtering by price. 
  1. Business Layer: Add filtering logic. 
  1. Model Layer: Ensure necessary properties exist. 
  1. Data Layer: Support filtering in the repository.
     

Copilot can then generate the corresponding code across all relevant layers, adapting to the project’s structure and maintaining consistency. 

Running the Project Locally 

To build and run the project locally, GitHub Copilot can assist with environment setup. Using tools like Visual Studio Code and the .NET CLI, the project can be quickly compiled and executed. After resolving any dependencies and integrating Copilot’s code suggestions, testing can be conducted using tools like Postman to ensure all endpoints function correctly. 

Key Benefits of Using GitHub Copilot 

Implementing GitHub Copilot for Web API development offers multiple advantages: 

  • Time Efficiency: Significantly reduces time spent on repetitive coding tasks. 
  • Consistency: Helps maintain a uniform coding pattern across different layers. 
  • Faster Iterations: Simplifies the process of adding or modifying features. 

Conclusion 

By combining a well-structured ASP.NET Core Web API with GitHub Copilot, teams can accelerate development while maintaining clean, scalable architecture. GitHub Copilot’s ability to understand project context and respond to developer prompts proves highly effective in enhancing productivity. For developers seeking to streamline their Web API workflows, GitHub Copilot offers a practical and efficient AI-powered solution. 


Artificial intelligence is the use of technologies to build machines and computers that can mimic cognitive functions (see images, listen to speech, understand text, make recommendations, etc.) associated with human intelligence.  Machine learning is a subset of AI that lets a machine learn from data without being explicitly programmed. 

Google Cloud Platform (GCP) offers a rich suite of AI and machine learning tools catering to users across different experience levels — from business analysts to seasoned ML engineers. Whether you’re analyzing structured data, classifying images, building custom deep learning models, or tapping into generative AI, there’s a GCP service tailored for you. 

In this expanded guide, you’ll learn: 

  • Key AI/ML tools in Google Cloud 
  • How to use them from the Cloud Console 
  • Practical examples and sample code 
  • Use cases, limitations, and best practices 
  • A comprehensive summary table to guide your choices 

1. BigQuery ML 

BigQuery ML democratizes machine learning by enabling analysts to build models using standard SQL syntax directly within BigQuery. It’s ideal for use cases involving large, structured datasets — like customer churn prediction, sales forecasting, and classification tasks. 

Key Features 

  • Supports classification, regression, time series, and clustering.
  • Uses built-in SQL functions for training, evaluation, and prediction.
  • No need to move data outside BigQuery.

Accessing BigQuery ML 

  1. Navigate to BigQuery Studio – https://console.cloud.google.com/bigquery 
  1. Open a project and select your dataset. 
  1. Use the SQL workspace to run queries like CREATE MODEL or ML.PREDICT. 

Sample: Logistic Regression 

CREATE OR REPLACE MODEL `my_dataset.customer_churn_model` 
OPTIONS(model_type='logistic_reg') AS 
SELECT 
  tenure, 
  monthly_charges, 
  contract_type, 
  churn 
FROM 
  `my_dataset.customer_data`;


Best Practices 

  • Normalize your numeric features before training.
  • Use ML.EVALUATE to assess model performance.
  • Partition of large datasets for efficient model training.

2. Vertex AI 

Vertex AI is GCP’s fully managed ML platform that provides a single UI and API for the complete ML lifecycle. It includes support for AutoML, custom model training, pipelines, feature store, and model deployment. 

a) Vertex AI AutoML 

AutoML simplifies model training by abstracting the heavy lifting of data preprocessing, feature selection, and hyperparameter tuning. It supports: 

  • Tabular classification/regression 
  • Image classification/object detection 
  • Text sentiment/entity analysis 
  • Video classification 


Accessing AutoML 

  1. Go to Vertex AI Dashboard – https://console.cloud.google.com/vertex-ai 
  1. Click “Train new model”. 
  1. Choose your data type (tabular, image, text, etc.). 
  1. Upload or link to your dataset. 
  1. Follow the guided training wizard. 


Sample Code: Image AutoML (Python) 

from google.cloud import aiplatform 
 
aiplatform.init(project="your-project", location="us-central1") 
 
dataset = aiplatform.ImageDataset.create( 
    display_name="my-image-dataset", 
    gcs_source=["gs://your-bucket/images/"], 
    import_schema_uri=aiplatform.schema.dataset.ioformat.image.single_label_classification, 
) 
 
job = aiplatform.AutoMLImageTrainingJob( 
    display_name="image-classifier-job", 
    model_type="CLOUD", 
    multi_label=False, 
) 
 
model = job.run( 
    dataset=dataset, 
    model_display_name="my-image-model", 
    training_filter_split={"training_fraction": 0.8, "validation_fraction": 0.1, "test_fraction": 0.1}, 
) 


Limitations 

  • Training time depends on data size and complexity.
  • Less control over model internals.

b) Vertex AI Custom Training 

Custom training is for advanced users who want to use frameworks like TensorFlow, PyTorch, or XGBoost. You can train models using your own scripts in Docker containers or managed Jupyter environments. 


Accessing Custom Training 

  1. In Vertex AI, go to TrainingCustom Jobs
  1. Choose your container or upload a training script. 
  1. Specify machine specs (CPUs, GPUs, TPUs). 


Sample Code (Python SDK) 

from google.cloud import aiplatform 
 
aiplatform.init(project="your-project-id", location="us-central1") 
 
job = aiplatform.CustomTrainingJob( 
    display_name="custom-train-job", 
    script_path="train.py", 
    container_uri="gcr.io/cloud-aiplatform/training/tf-cpu.2-11:latest", 
    model_serving_container_image_uri="gcr.io/cloud-aiplatform/prediction/tf2-cpu.2-11:latest" 
) 
 
model = job.run(replica_count=1, machine_type="n1-standard-4") 


Use Cases 

  • NLP models using Transformers.
  • Computer vision models with custom CNNs.
  • Reinforcement learning pipelines.

3. Pre-trained APIs 

Google Cloud offers pre-trained APIs that let you access powerful AI capabilities with minimal setup. These are RESTful services available via API calls or SDKs. 

Key Services 

API Capabilities 
Vision API Image labeling, OCR, object detection 
Natural Language Sentiment, syntax, entity recognition 
Speech-to-Text Audio transcription 
Text-to-Speech Audio generation from text 
Translation Language translation 


Accessing Pre-trained APIs 

  1. Go to API Libaray – https://console.cloud.google.com/apis/library 
  1. Enable the required API. 
  1. Create credentials (API key or service account). 
  1. Use client libraries (Python, Node.js, Java, etc.) or REST calls. 


Sample: Vision API (Label Detection) 

from google.cloud import vision 
 
client = vision.ImageAnnotatorClient() 
 
with open("photo.jpg", "rb") as image_file: 
    content = image_file.read() 
 
image = vision.Image(content=content) 
response = client.label_detection(image=image) 
 
for label in response.label_annotations: 
    print(f"{label.description}: {label.score:.2f}") 


4. Generative AI with Gemini  

Google’s Gemini APIs power generative AI features such as chatbots, summarization, code completion, and document synthesis. These are hosted on Vertex AI with tools like Model Garden and Vertex AI Studio. 


Accessing Generative AI 

  1. Visit Vertex AI Studio – https://console.cloud.google.com/vertex-ai/studio 
  1. Use a prompt gallery or freeform chat interface. 
  1. Choose a language model (Gemini Pro, Gemini Flash, etc.). 
  1. For programmatic access, use vertexai Python SDK. 


Sample Code: Text Generation 


Use Cases 

  • Automating customer service (chatbots).
  • Creative writing or story generation. 
  • Code suggestions or bug fixes.


5. Choosing the Right Tool 

Use Case Recommended Service Ideal User Code Requirement Notes 
Predict outcomes with SQL BigQuery ML Data analysts No Great for structured data 
Train models with minimal code Vertex AI AutoML Citizen developers Low Handles preprocessing, tuning 
Train advanced ML/DL models Vertex AI Custom ML engineers High Use your own framework and logic 
Extract insights from media/files Pre-trained APIs All developers Low Fastest way to use AI 
Build chatbots or code generators Generative AI (Gemini) All developers Low Great for LLM and content generation tasks 


6. Conclusion and Resources 

Google Cloud provides one of the most comprehensive, scalable, and user-friendly AI ecosystems available today. With services for every level of expertise, you can start with SQL in BigQuery ML and grow into training deep models in Vertex AI. Pair that with powerful APIs and generative tools — and you have everything you need to build production-ready AI. 


Helpful Links


Happy experimenting! 

CloudIQ is excited to announce our Sitecore Silver Partnership designation, marking a significant milestone in our commitment to delivering exceptional digital experiences and e-commerce solutions.

This partnership highlights our expertise in Sitecore’s comprehensive suite of products, including Sitecore XP, Sitecore XM Cloud, Sitecore Content Hub, Sitecore CDP, Sitecore Personalize, Sitecore Send, and Sitecore OrderCloud.

Sitecore Solutions Expertise

As a Sitecore Silver Partner, CloudIQ has successfully met Sitecore’s rigorous standards for competency and performance. Our team, consisting of Sitecore-certified MVPs, brings proven expertise, from initial planning to full-scale deployment—ensuring the optimal performance of Sitecore platforms across a wide range of business needs.

Client-Centric Approach

CloudIQ understands the need for cost-effective solutions and a strategic approach. From infrastructure audits to comprehensive Sitecore solution design and implementation, we offer scalable models for onshore and hybrid teams, ensuring flexibility and efficiency in project execution. Our team is committed to driving your digital transformation with Sitecore, offering expert guidance and seamless migrations to the latest versions and Sitecore SaaS offerings.

Commitment to Quality and Customer Success

Achieving Sitecore Silver Partnership highlights CloudIQ’s dedication to excellence. We continue to focus on driving customer success, ensuring that every solution we implement is tailored to meet the unique needs of our clients. From strategy and design to execution and optimization, CloudIQ’s Sitecore expertise ensures that businesses can effectively navigate the complexities of the digital landscape and achieve sustained growth.

Contact us today to learn how CloudIQ’s Sitecore expertise can optimize your digital ecosystem.

For decades, improving patient experience has been a top priority for healthcare systems . Yet, despite these efforts, many healthcare systems still face challenges like long wait times, impersonal interactions, and inefficiencies that hinder optimal care. 

Artificial intelligence has begun to revolutionize healthcare delivery, offering solutions that personalize care, reduce friction in patient interactions, and provide valuable insight. But how deep does AI’s potential go in transforming patient experience? The more we explore, the more we realize that AI isn’t just a tool; it’s a fundamental shift in how healthcare will operate in the years to come. In this blog, we explore AI’s impact on patient interactions, the various AI solutions in use today, and why adoption rates are skyrocketing across the industry.

AI Patient Experience Solutions Delivering Value Across Multiple Use Cases

1. Intelligent Virtual Assistants

Intelligent virtual assistants, often powered by chatbots, are revolutionizing patient engagement by providing immediate responses to inquiries and facilitating appointment scheduling. These AI solutions enhance communication between patients and healthcare providers, ensuring that patients receive timely information about their health needs and appointments. They can also offer personalized health tips based on individual patient data, thereby improving adherence to treatment plans.

2. Remote Patient Monitoring

Remote patient monitoring utilizes AI to continuously track patients’ vital signs and health metrics through wearable devices. This technology allows healthcare providers to receive real-time updates on patient conditions, enabling proactive interventions when necessary. By integrating remote monitoring with telehealth services, patients can maintain regular contact with their healthcare teams without the need for frequent in-person visits.

3. Predictive Analytics

Predictive analytics in healthcare employs AI to analyze historical patient data and predict future health outcomes. This capability allows providers to identify at-risk patients early, facilitating timely preventive measures. By understanding patterns and trends in patient data, healthcare organizations can tailor interventions to improve overall health outcomes and reduce hospital readmissions.

4. Personalized Communication solutions

AI-driven personalized communication tools enhance the way healthcare providers interact with patients. By analyzing individual preferences and behaviors, these solutions can deliver customized messages, reminders for appointments, and educational resources tailored to specific health conditions. This personalized approach fosters better engagement and satisfaction among patients.

5. Automated Administrative Processes

AI is optimizing administrative tasks like scheduling, billing, and claims processing, reducing the administrative burden on healthcare staff. This shift to automation enhances operational efficiency and allows healthcare teams to contribute to a better overall patient experience.

Leading Healthcare Organizations Using Patient Experience Solutions for

1. GetWellNetwork

GetWellNetwork is a leading platform that enhances patient engagement through interactive technology solutions. It provides personalized content and resources tailored to individual patient needs, improving their overall experience during hospital stays. The platform enables patients to access educational materials about their conditions and treatment options while facilitating communication with their care teams.

2. LumaHealth

LumaHealth focuses on improving patient communication through its innovative platform that automates appointment reminders and follow-ups. By utilizing AI-driven messaging systems, LumaHealth ensures that patients receive timely notifications about their appointments and necessary health checks. This proactive approach helps reduce no-show rates and enhances patient adherence to care plans.

3. AthenaHealth

AthenaHealth offers a comprehensive suite of cloud-based services designed to improve practice management and patient engagement. Its AI capabilities include automated appointment scheduling, billing processes, and personalized communication strategies that enhance the overall patient experience. The platform’s focus on interoperability ensures seamless data sharing among providers for coordinated care.

4. Phreesia

Phreesia is an innovative solution that streamlines the check-in process for patients using digital kiosks and mobile applications powered by AI technology. Patients can complete forms electronically, reducing wait times and enhancing operational efficiency for healthcare providers. This solution also allows for personalized messaging based on individual health needs.

5. CipherHealth

CipherHealth specializes in enhancing post-discharge communication through its AI-driven outreach programs. By utilizing automated calls and text messages, CipherHealth ensures that patients receive essential follow-up care instructions after leaving the hospital. This proactive engagement helps improve recovery rates and reduces readmission risks.

AI Adoption and Market Growth in Healthcare Industry

The adoption of AI in healthcare has seen remarkable growth over the past few years. According to recent studies by Dialog Health, approximately 86% of healthcare organizations are investing in AI technologies to improve patient experiences. 

The global market for AI in healthcare is projected to reach $45 billion by 2026, reflecting a compound annual growth rate (CAGR) of over 40% from its 2022 valuation, as reported by Market.us Scoop. These statistics underscore the increasing reliance on AI solutions as healthcare providers seek innovative ways to enhance efficiency while delivering personalized care.
Furthermore, surveys indicate that around 70% of patients express a willingness to use AI-powered digital health solutions for managing their health, according to findings from the Chief Healthcare Executive.com. This growing acceptance among patients highlights the potential for AI technologies to transform traditional healthcare practices into more engaging and effective experiences.

Leading Healthcare Leaders adopted AI-Powered Patient Experience

Our latest findings highlight the top 26 healthcare organizations that are leading the charge in integrating AI into their patient experience strategies. These organizations are leveraging AI to improve patient engagement, reduce wait times, and enhance communication, ultimately providing more efficient and personalized care.

  1. Atlantic Health System
  2. Banner Health
  3. Children’s Healthcare of Atlanta
  4. Children’s Mercy Kansas City
  5. Grady Health System
  6. Guthrie
  7. Hackensack Meridian Health
  8. Intermountain Health
  9. Johns Hopkins Medicine
  10. LifeStance Health
  11. Maring Health
  12. Mile Bluff Medical Center
  13. Moffitt Cancer Center
  14. Ochsner Health
  15. Piedmont
  16. Premise Health
  17. Reynolds University
  18. Saint Luke’s Health System Kansas
  19. Sanford Health
  20. Southcoast Health
  21. Southeast Georgia Health System
  22. Trinity Health
  23. UC Davis Health
  24. Unified Women’s Healthcare
  25. WellSpan Health
  26. Wellstar Health System

By integrating AI-powered solutions such as virtual assistants, predictive analytics, and automated scheduling, these providers are improving not only operational efficiency but also patient satisfaction.

As AI continues to evolve, its role in healthcare is expanding, offering more precise and personalized care options. These advancements are allowing healthcare professionals to spend less time on administrative tasks and more time focusing on patient needs. With AI-driven tools, providers can anticipate patient needs, ensure timely follow-ups, and enhance the overall healthcare experience.

Interested in improving patient engagement and satisfaction through AI?

Contact us to explore customized solutions for your healthcare organization.

AI in Medical Imaging and Diagnostics

Medical imaging technologies have long been integral to diagnostic medicine, yet interpreting these images demands significant time and expertise. Artificial intelligence (AI) is addressing these challenges by transforming how medical images are analyzed, enabling faster, more precise, and reliable interpretations. This innovation is particularly impactful in fields like radiology, oncology, and neurology, where timely and accurate diagnoses can save lives.

In this blog, we’ll explore how AI is advancing medical imaging, delve into its real-world applications that are helping doctors make better decisions, and examine how AI is being adopted in healthcare systems worldwide. With AI’s growing presence in medical imaging, it’s paving the way for more accurate diagnoses and faster, better care for patients.

Medical Imaging Solutions Delivering Value Across Multiple Use Cases

Here are some of the top AI solutions used in medical imaging, along with their primary use cases:

1. IBM Watson for Oncology

Use Case: Oncology Diagnostics

IBM Watson for Oncology leverages AI to analyze vast amounts of medical data, including clinical literature and patient records, to assist oncologists in making treatment decisions. It provides personalized recommendations based on a patient’s unique profile, enhancing the precision of cancer care.

2. ENDEX by Enlitic

Use Case: General Medical Imaging Analysis

ENDEX utilizes deep learning algorithms to analyze various medical images such as X-rays, CT scans, and MRIs. It detects abnormalities like tumors and fractures with high accuracy, aiding in early diagnosis and treatment planning. Its user-friendly interface facilitates integration into existing workflows, making it accessible to healthcare providers.

3. IDx-DR

Use Case: Ophthalmology

IDx-DR is an FDA-approved autonomous AI system specifically designed for detecting diabetic retinopathy through retinal image analysis. It evaluates images captured by fundus cameras, identifying critical signs of the disease that could lead to blindness if not addressed promptly.

4. Zebra Medical Vision

Use Case: Multi-specialty Imaging Analysis

Zebra Medical Vision offers a suite of AI solutions that analyze medical images across various specialties, including radiology and cardiology. The platform is capable of detecting conditions such as fractures, cardiovascular diseases, and liver conditions from X-rays and CT scans, facilitating timely interventions.

5. Arterys Cardio AI (Tempus Pixel Cardio)

Use Case: Cardiovascular Imaging

This solution automates the analysis of cardiac MRI images using advanced deep learning algorithms. It quantifies cardiac parameters like blood flow and tissue characterization, providing clinicians with valuable insights for diagnosing and managing heart conditions with enhanced accuracy.

6. Siemens Healthineers AI-Rad Companion

Use Case: Radiology Workflow Enhancement

The AI-Rad Companion automates the highlighting and quantification of anatomical structures in imaging studies such as chest CTs. This streamlines the workflow for radiologists by providing automated assessments that reduce interpretation time and improve diagnostic consistency.

7. Blackford

Use Case: Image Reconstruction

Blackford offers AI-powered solutions for medical image reconstruction that enhance detail and reduce noise in CT scans. This improves image quality, which is crucial for accurate diagnosis.

Leading Healthcare Organizations Using Medical Imaging for

1. Enhanced Diagnostic Accuracy

AI-powered solutions excel at identifying patterns and anomalies that might be subtle or overlooked by the human eye. For instance, AI algorithms trained on vast datasets can detect early-stage cancers, cardiovascular irregularities, and other conditions with remarkable precision. This improves diagnostic confidence and reduces the risk of misdiagnosis.

2. Early Detection of Diseases

AI can analyze medical images to detect early signs of diseases before they become symptomatic. This capability allows for the identification of conditions such as cancers, heart disease, and neurological disorders in their earliest stages, when treatment options are often more effective and less invasive. By recognizing subtle patterns that may be missed by the human eye, AI enables timely interventions, improving patient outcomes.

3. Faster Diagnosis and Intervention

Traditional imaging analysis can be time-intensive, particularly in high-volume healthcare settings. AI significantly reduces the time needed to process and interpret imaging results, enabling physicians to provide quicker diagnosis. This is especially critical in emergency situations, such as stroke or trauma, where time is a crucial factor.

4. Personalized Treatment Planning

By analyzing imaging data alongside patient histories and other clinical factors, AI can assist in creating tailored treatment plans. For example, it can predict tumor progression or assess the likely success of a particular therapy, ensuring that treatment is customized to the individual patient’s needs.

5. Improved Workflow and Productivity

AI automates repetitive tasks such as image segmentation, prioritization of urgent cases, and report generation. This allows radiologists and other healthcare professionals to focus on complex cases and patient care, reducing burnout and enhancing overall productivity.

AI Medical Imaging Market Growth

The global AI medical imaging market is projected to grow significantly, from $5.86 billion in 2024 to $20.40 billion by 2029, reflecting a compound annual growth rate (CAGR) of 28.32% (Source: MarketsandMarkets, 2023). This growth is driven by the increasing adoption of AI technologies for disease diagnosis and image analysis, which are enhancing diagnostic accuracy and operational efficiency.

Similarly, the AI diagnostics market is expected to rise from $1.85 billion in 2024 to $14.76 billion by 2034, at a CAGR of 23.1% (Source: Allied Market Research, 2023). This expansion is largely driven by the growing demand for accurate diagnostic solutions and the integration of AI into various diagnostic processes.

Leading Healthcare Leaders adopted AI-Powered Medical Imaging

Our recent research has identified the top 32 healthcare organizations that have successfully integrated AI technologies into their medical imaging practices, setting new standards in diagnostic accuracy, efficiency, and patient care.

The continued adoption of these technologies promises to elevate the quality of care, enabling faster, more precise diagnoses and improving decision-making across various medical specialties. As AI becomes more integrated into medical imaging, it not only enhances diagnostic accuracy but also optimizes workflows, allowing healthcare professionals to focus more on patient care. 

With healthcare systems worldwide embracing AI innovations, patients will benefit from timely, personalized care, while medical professionals gain the solutions needed to deliver better health outcomes. The advancements in AI medical imaging are already making a significant difference in healthcare, with their impact expected to grow in the coming years.

Interested in enhancing your diagnostic processes with AI solutions?

Reach out for more details!

AI chatbot in healthcare

The healthcare industry is experiencing a digital shift, with AI-powered chatbots playing a key role in improving efficiency and patient care. These intelligent assistants streamline operations, support healthcare professionals, and engage patients in real-time.

As AI chatbots gain traction in healthcare, they are driving improvements in everything from patient scheduling to providing personalized medical advice. This blog explores the unique ways AI chatbots are making a difference, the growth in their adoption, and some of the top AI chatbot solutions that are setting new standards in the industry.

Leading Healthcare Organizations using AI-Chatbots for

Symptom Assessment and Triage

AI chatbots analyze patient symptoms, providing initial triage to direct individuals to appropriate care levels, which reduces the burden on emergency departments. Their advanced natural language processing (NLP) capabilities allow them to assess the urgency of symptoms and suggest potential diagnoses, ensuring timely care.

Appointment Scheduling

Chatbots automate the appointment booking process by integrating seamlessly with electronic health record (EHR) systems. This eliminates administrative overhead, reduces scheduling errors, and ensures patients can easily book, reschedule, or cancel appointments.

Medication Reminders

AI chatbots send personalized reminders to patients for medication intake, enhancing adherence to treatment plans and reducing hospital readmissions. This promotes better recovery and overall patient health.

Post-Treatment Follow-Up

After patient discharge, chatbots check in with patients to gather recovery data and alert physicians if intervention is needed. This continuous monitoring improves patient outcomes by ensuring timely follow-up and proactive care.

Health Education

With access to vast medical databases, chatbots provide accurate and easy-to-understand health information, empowering patients to make informed decisions about their care. This helps improve patient education and overall engagement.

24/7 Patient Support

AI chatbots offer round-the-clock assistance, answering questions, providing symptom information, and guiding patients on the next steps. This enhances patient satisfaction by ensuring continuous access to support and timely care.

Scalability Across Healthcare Systems

AI chatbots can be deployed across large healthcare networks, ensuring consistent patient support across multiple locations. Their ability to manage a high volume of interactions simultaneously makes them essential in supporting large patient populations.

AI-powered Chatbot solutions in the market

Ada Health

Ada Health uses AI to guide patients through a personalized diagnostic process. By asking targeted questions and leveraging a symptom checker, it provides instant medical advice based on patient-reported symptoms, seamlessly integrating with healthcare systems to streamline patient interactions.

Buoy Health

Buoy Health’s AI-driven platform assists patients in assessing symptoms and recommending potential diagnoses. It helps alleviate strain on healthcare systems by triaging cases before they reach a physician or clinic.

Woebot

Woebot delivers cognitive behavioral therapy (CBT) to support mental well-being. By tracking mood changes and offering therapeutic conversations, it also directs users to appropriate healthcare resources when necessary.

IBM Watsonx Assistant

IBM Watsonx Assistant aids healthcare providers and patients by answering medical queries, scheduling appointments, and supporting administrative tasks. Its advanced natural language processing ensures accurate responses based on integrated healthcare databases.

Notable Patient AI Platform

Notable’s AI platform enhances patient engagement by automating appointment scheduling, check-ins, and post-visit follow-ups. Integrated with electronic health record (EHR) systems, it streamlines patient-provider interactions.

AI Adoption and Market Growth in Healthcare Industry

The healthcare chatbot market is experiencing rapid growth, with projections indicating an increase from $0.35 billion in 2023 to $0.45 billion in 2024, reflecting a compound annual growth rate (CAGR) of 25.7% (Persistence Market Research, 2024). By 2030, the market is expected to reach $1.18 billion, with a robust CAGR of 27.7% from 2024 to 2028. This growth is fueled by rising healthcare costs, shortages of healthcare professionals, and the increasing demand for immediate access to medical information.

In terms of adoption, approximately 37% of consumers reported using generative AI for health-related purposes as of late 2024 (PYMNTS, 2024). Additionally, about 75% of healthcare leaders are either experimenting with or planning to scale generative AI across their organizations (The Business Research Company, 2024). These statistics highlight the growing recognition of AI chatbots as essential tools in enhancing operational workflows and improving patient engagement within the healthcare sector.

Leading Healthcare Organizations Adopting AI Chatbot Solutions

Our latest research identifies 20 healthcare organizations at the forefront of integrating AI-powered chatbot solutions. These industry leaders are transforming patient experience, offering a new level of efficiency, personalization, and responsiveness in care delivery.

  • AccentCare
  • Ballad Health
  • CommonSpirit Health
  • Gillette Children’s Hospital
  • Hattiesburg Clinic
  • LifeBridge Health
  • LifeStance Health
  • MemorialCare
  • Nemours
  • Northwell Health
  • Ochsner Lafayette General
  • Palomar Health
  • Parkview Health
  • Saint Luke’s Health System Kansas
  • Sharp HealthCare
  • Southeast Georgia Health System
  • Tidelands Health
  • UC Davis Health
  • Vituity
  • Wellstar Health System

The integration of AI chatbots is transforming how healthcare providers communicate with patients. By automating routine inquiries and streamlining communication, chatbots are enhancing patient access to information, reducing wait times, and supporting clinical workflows. These advancements allow healthcare teams to focus more on direct patient care, while patients receive quicker and more accurate responses to their needs.

As the healthcare sector embraces AI-powered chatbots, the focus shifts to improving patient experience, operational efficiency, and overall care outcomes. These solutions are not only reducing operational costs but also shaping the future of healthcare interactions, making them more accessible and efficient.

Ready to elevate patient experience with AI-driven chatbot solutions?

Contact us to learn more!

medical-trans-AI

In an industry as critical as healthcare, time spent on administrative tasks is time taken away from patient care. With growing patient volumes, rising operational costs, and mounting administrative burdens, traditional documentation methods are becoming increasingly unsustainable. Physicians, under significant pressure, are confronted with time-consuming transcription processes that detract from the quality of patient care.

As the demand for faster, more accurate transcription rises, healthcare organizations are turning to advanced solutions that promise to streamline workflows and alleviate administrative strain. With powerful AI-driven solutions combining automation and intelligent transcription, healthcare providers can significantly reduce time spent on paperwork while enhancing the accuracy and completeness of their medical records. 

This shift is more than a technological upgrade, it represents a necessary transformation that drives operational efficiency, ensures advanced documentation, and ultimately enhances patient care.

Leading Healthcare Organizations using AI-based Transcription

1. Saves Time for Physicians

Clinical documentation significantly reduces the time physicians spend on documentation. Instead of manually typing notes, physicians can dictate their findings during or after consultations. By automating this process, it reduces the hours spent on post-visit paperwork, freeing up physicians to focus on patient care, see more patients, reduce overtime, and maintain a better work-life balance.

2. Improves Documentation Accuracy

With advanced technologies like voice recognition and natural language processing (NLP), clinical documentation ensures that complex medical terms, diagnoses, and abbreviations are accurately captured. This reduces the risk of errors in patient records, which is essential for maintaining quality care, ensuring regulatory compliance.

3. Customization for Medical Specialties

Modern clinical documentation solutions adapt to the specific needs of different specialties, whether it’s cardiology, oncology, or radiology. They support specialty-specific templates, terminology, and formats.Tailored documentation improves the precision and relevance of medical records across various healthcare fields.

4. Cost Savings

By reducing reliance on manual documentation processes and traditional transcription services, clinical documentation minimizes operational costs. Digital solutions eliminate errors, reduce rework, and speed up workflows. These savings allow healthcare facilities to allocate resources more effectively, improving patient services and optimizing operational budgets.

5. Enhances Operational Efficiency

By streamlining the documentation process, clinical documentation improves overall operational efficiency in healthcare settings. Notes are generated faster, post-visit documentation is simplified, and integration with EHR systems eliminates redundant data entry. This optimized workflow enables healthcare teams to function more productively, ultimately improving patient care delivery.

AI-based Transcription solutions in the market

1. CloudIQ Technologies Transcription Solution

CloudIQ’s AI-powered telemedicine application, built on Microsoft Azure Services, uses OpenAI’s Whisper model to transcribe physician-dictated notes in real time and leverages the ChatGPT API to convert them into structured, accurate documentation. By seamlessly integrating into clinical workflows, this solution reduces administrative burdens, saves physicians over 2 hours per day, and enables the treatment of 4,000 additional patients daily, improving productivity and patient care.

2. Dragon Medical One

Dragon Medical One by Nuance is a cloud-based speech recognition software that allows healthcare professionals to dictate directly into healthcare systems. It supports medical terminology and customizable commands for efficient documentation. Its cloud-based architecture facilitates remote access, enhancing productivity across various settings.

3. M*Modal Fluency

3M M*Modal Fluency uses AI to assist with real-time voice recognition and transcription, converting speech into structured clinical documentation. It integrates with EHR systems, streamlining documentation across various specialties and improving accuracy by recognizing medical terminology specific to fields like radiology and cardiology.

4. DeepScribe 

DeepScribe transcribes physician-patient conversations in real time, automatically structuring documentation with minimal input. It integrates into hospital workflows, handles specialized medical terminology, and automates post-encounter documentation.

5. Suki AI

Suki leverages voice recognition for clinical documentation, provides real-time diagnosis code suggestions to assist with coding and billing, and retrieves patient details like medications, allergies, and history, supporting informed decision-making during consultations.

AI Adoption and Market Growth in Healthcare Industry

The global healthcare AI market was valued at USD 19.27 billion in 2023 and is projected to grow at a compound annual growth rate (CAGR) of 38.5% from 2024 to 2030 (Grand View Research). This rapid growth underscores the increasing adoption of AI technologies across healthcare systems.

AI-powered solutions, particularly in clinical documentation, are gaining significant traction. The clinical documentation market alone is expected to grow from USD 2.5 billion in 2024 to USD 6.6 billion by 2031 (Persistence Market Research). These solutions are helping healthcare providers automate documentation processes, save time, and enhance the accuracy of patient records.

In fact, 79% of healthcare organizations have already adopted AI technologies (Microsoft-IDC Study, 2024), reflecting the sector’s significant move toward digital transformation.

Top Healthcare Companies adopted AI clinical documentation

Our latest research highlights 40 leading healthcare organizations that have successfully adopted AI-powered clinical documentation solutions. These innovators are transforming healthcare documentation with cutting-edge technology, setting new standards for efficiency and accuracy in patient care.

The successful adoption of AI clinical documentation tools by these organizations underscores a broader trend in healthcare. As more institutions embrace these technologies, the focus shifts toward improving clinician productivity, enhancing patient care, and ensuring regulatory compliance.

With AI clinical documentation solutions, these organizations are not only increasing efficiency but also paving the way for the next evolution in healthcare. As we move forward, we can expect AI-powered transcription to become an essential part of every healthcare provider’s toolkit.

Looking to reduce administrative burdens in healthcare with AI-driven clinical documentation?

Contact us to get started!

Data has become one of the most valuable assets for businesses, driving the decisions that lead to growth and efficiency. But on its own, raw data can be complex and difficult to interpret. That’s where data analysis comes in. By breaking down information, identifying patterns, and creating visual insights, data analysis makes it possible to transform numbers into clear answers. Microsoft Power BI is one such tool, making this process not only easier but more accessible to users at all levels.

In this blog, we’ll walk through the basics of data analysis and visualization, explore why it’s essential for businesses today, and Power BI empowers organizations to make smarter, data-driven decisions. Whether you’re new to data or just beginning to explore Power BI, this guide will give you the foundational steps to start creating your own data-driven reports.

What is Data Analysis?

Data analysis is the process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.  In today’s business world, data analysis plays a crucial role in making decisions more scientific and helping businesses operate more effectively.

What is Business Intelligence?

Business intelligence (BI) is a technology-driven process for analyzing data and delivering actionable information that helps executives, managers and workers make informed business decisions. As part of the BI process, organizations collect data from internal IT systems and external sources, prepare it for analysis, run queries against the data and create data visualizations, BI dashboards and reports to make the analytics results available to business users for operational decision-making and strategic planning.

What is Power BI?

Microsoft Power BI is an interactive data visualization software product developed by Microsoft with a primary focus on business intelligence. It is part of the Microsoft Power Platform.

Power BI is a collection of software services, apps, and connectors that work together to turn unrelated sources of data into coherent, visually immersive, and interactive insights. Data may be input by reading directly from a database, webpage, PDF, or structured files such as spreadsheets, CSV, XML, JSON, XLSX, and SharePoint.

The parts of Power BI

Power BI consists of several elements that all work together, starting with these three basics:

  • A Windows desktop application called Power BI Desktop.
  • An online software as a service (SaaS) service called the Power BI service.
  • Power BI Mobile apps for Windows, iOS, and Android devices.

These three elements—Power BI Desktop, the service, and the mobile apps—are designed to let users create, share, and consume business insights.

How Power BI Works: From Data to Insights

Power BI’s workflow can be summarized in three main steps: Connect, Transform, and Visualize.

  • Connect: Power BI allows you to connect various data sources, including Excel files, databases, cloud services, and APIs.
  • Transform: Once connected, you can clean and shape your data using Power Query Editor. This step involves removing errors, merging datasets, and creating calculated columns to prepare your data for analysis.
  • Visualize: After transforming your data, you can start creating interactive visualizations using the built-in chart types, custom visuals, and filters.

Why Power BI is a Game-Changer for Businesses

  1. Accessibility: One of Power BI’s greatest strengths is its accessibility. It’s designed for business users, not just IT professionals or data scientists. With an intuitive interface and a wealth of online resources, users can start creating reports and dashboards with minimal training.
  2. Cost-Effective: Compared to other business intelligence tools, Power BI offers a cost-effective solution with its freemium model. Small businesses can start with Power BI Desktop, which is free, and scale up to Power BI Pro or Premium as their needs grow.
  3. Real-Time Insights: In today’s fast-paced business environment, real-time data is crucial. Power BI allows users to connect to live data sources and build dashboards that update in real time. This ensures that decision-makers always have access to the most current information.
  4. Scalability: Power BI is suitable for businesses of all sizes, from small startups to large enterprises. It can handle small datasets in Excel as well as massive data warehouses, providing a scalable solution that grows with the business.
  5. Integration with Other Tools: Power BI’s ability to integrate with other Microsoft products, as well as third-party services, makes it a versatile tool in any business’s technology stack. Whether we are working in Excel, managing projects in Azure, or collaborating in Teams, Power BI can enhance our workflow.

Hands-On Exercise: Creating a Sales Dashboard

In this exercise, we’ll create a simple sales dashboard using sample data. Follow these steps to get started:

Step 1: Download and Install Power BI Desktop

1. Go to the Power BI website.

2. Download and install Power BI Desktop, which is free to use.

Step 2: Import Sample Data

For this exercise, we’ll be using a sample dataset, which you can download below,

1. Open Power BI Desktop.

2. Click on Home > Get Data > Excel.

3. Browse and select the sample Excel file.

4. In the Navigator window, select the worksheet that contains your data, and click Load.

Step 3: Clean and Transform the Data

Before visualizing the data, let’s make sure it’s clean and ready for analysis.

1. Click on Transform Data to open the Power Query Editor.


2. Here, you can remove any unwanted columns, rename columns, or change data types. For example, ensure the “Date” column is formatted as a Date type.

3. Once done, click Close & Apply.

Step 4: Create Visualizations

Now that your data is ready, it’s time to create some visualizations.

Bar Chart

1. Click on Bar Chart from the Visualizations pane.

2. Drag the “Product” field to the Y-axis and the “Sales” field to X-axis.

3. This will create a bar chart showing total sales by product.

Line Chart

1. Click on Line Chart from the Visualizations pane.

2. Drag the “Date” field to the X-Axis and the “Sales” field to Y-Axis.

3. This will create a line chart showing sales trends over time.

4. Clicking on “Expand all down one level in the hierarchy” option will display the chart in terms of month and year.

Map

1. Click on the Map visualization.

2. Drag the “Country” field to Location and the “Sales” to Size.

3. This will create a map showing sales distribution by country.

Step 5: Create a Dashboard

Let’s bring all these visuals together into a single dashboard.

1. Arrange your charts on the report canvas by dragging and resizing them.

2. You can add slicers to filter data interactively. For instance, add a slicer for the “Date” field to filter data by specific time periods.

3. Customize the visuals using formatting options like colours, labels, and titles.

Step 6: Save and share your report

Once your dashboard is complete:

1. Click on File > Save As to save your report.

2. To share it, publish it to the Power BI Service by clicking Publish on the Home ribbon, and then choose where to save it.

3. The published report can be seen in Power BI service https://app.powerbi.com/


Congratulations on completing the exercise! You’re one step closer to mastering Power BI.

­­By now, you’ve seen how easy it is to get started with Power BI. Through this hands-on exercise, you’ve learned how to connect to data, clean it up, and create interactive visualizations, all key steps in turning raw information into useful insights.

Power BI’s user-friendly interface and powerful features make it an ideal tool for anyone looking to dive into data analytics. With just a little practice, you’ll be able to create impactful reports and dashboards that can help drive smarter decisions in your business. Keep exploring Power BI, experiment with your own data, and start turning your insights into action.

Introduction

In the dynamic world of business, companies are always looking for innovative solutions to enhance competitiveness, drive down costs, and augment profits while embracing sustainability. Enter Artificial Intelligence (AI), a transformative tool that goes beyond mere automation, particularly with the advent of generative AI. This blog aims to explore the deeper layers of how companies can not only leverage AI to cut costs and boost profits but also contribute to building a sustainable future.

1. Automation

At its core, AI’s role in automation extends far beyond streamlining routine processes. Integrating AI into automation processes enables a more nuanced understanding of data, allowing for predictive analysis and proactive decision-making. This, in turn, minimizes downtimes and optimizes resource allocation. Moreover, AI-driven automation facilitates the identification of inefficiencies and bottlenecks that may go unnoticed in traditional systems, enabling companies to fine-tune their processes for maximum efficiency. In terms of cost reduction, AI excels in repetitive and rule-based tasks, reducing the need for manual labor and minimizing errors. Beyond the financial benefits, incorporating AI into automation aligns with sustainability goals by optimizing energy consumption, waste reduction, and overall resource management.

2. Predictive Analytics

AI’s real-time data processing capabilities empower companies with predictive analytics, offering a glimpse into the future of their operations. By analyzing historical data, AI forecasts market trends, customer behaviors, and potential risks. Consider a retail giant utilizing AI algorithms to predict customer preferences. This not only optimizes inventory management but also contributes to waste reduction and sustainability efforts.

By predicting future market trends, customer behavior, and operational needs, businesses can optimize their resource allocation, streamline operations, and minimize waste. This not only trims costs but also enhances profitability by aligning products and services with market demands. Moreover, predictive analytics enables companies to anticipate equipment failures, preventing costly downtime and contributing to a more sustainable operation. Harnessing the power of AI in predictive analytics is not just about crunching numbers; it’s about gaining insights that empower strategic decision-making, fostering a resilient and forward-thinking business model.

3. Personalization at Scale

Generative AI enables hyper-personalization by analyzing vast datasets to understand individual preferences, behaviors, and trends. Companies can utilize advanced algorithms to tailor products or services in real-time, offering a personalized experience that resonates with each customer. This not only fosters customer satisfaction but also drives increased sales and brand loyalty. On the cost front, AI streamlines operations through predictive analytics, optimizing supply chain management, and automating routine tasks. This not only reduces operational expenses but also enhances efficiency. In terms of sustainability, AI aids in resource optimization, minimizing waste and energy consumption. By understanding customer preferences at an intricate level, companies can produce and deliver exactly what is needed, mitigating excess production and waste.

4. Supply Chain Optimization

AI’s pivotal role in optimizing supply chains is revolutionizing sustainability efforts. Generative AI aids in demand forecasting, route optimization, and inventory management, minimizing waste and reducing the carbon footprint. Retail giants like Walmart have successfully implemented AI-powered supply chain solutions, resulting in substantial cost savings and environmental impact reduction.

AI can optimize various facets of the supply chain, from demand forecasting to inventory management. By analyzing historical data and real-time information, AI algorithms can make accurate predictions, preventing overstock or stockouts, thereby minimizing waste and maximizing efficiency. Additionally, AI-driven automation in logistics can streamline operations, cutting down on manual errors and reducing labor costs. Route optimization algorithms can optimize transportation, not only saving fuel and time but also curbing the carbon footprint. Predictive maintenance powered by AI ensures that equipment is serviced proactively, preventing costly breakdowns. Overall, the integration of AI into supply chain processes empowers companies to make data-driven decisions, fostering agility and resilience, ultimately translating into reduced costs, increased profits, and a more sustainable business model.

5. Predictive Maintenance

Generative AI’s impact extends to equipment maintenance, transforming the game by predicting machinery failures. Analyzing data from sensors and historical performance, AI algorithms forecast potential breakdowns, enabling proactive maintenance scheduling. This not only minimizes downtime but also significantly reduces overall maintenance costs, enhancing operational efficiency.

Picture this: instead of waiting for equipment to break down and incurring hefty repair costs, AI algorithms analyze historical data, sensor inputs, and various parameters to predict when machinery is likely to fail. This foresight enables businesses to schedule maintenance precisely when needed, minimizing downtime and maximizing productivity. This involves not just reacting to issues but proactively preventing them. By harnessing AI for predictive maintenance, companies can extend the lifespan of equipment, optimize resource allocation, and, ultimately, boost their bottom line. Moreover, reducing unplanned downtime inherently aligns with sustainability goals, as it cuts down on unnecessary resource consumption and waste associated with emergency repairs.

6. Fraud Detection

The ability of AI to detect patterns and anomalies proves invaluable in combatting fraud. Financial institutions, for instance, deploy generative AI to analyze transaction patterns in real-time, identifying potentially fraudulent activities. This not only safeguards profits but also bolsters the company’s reputation by ensuring a secure environment for customers.

AI systems can analyze vast datasets with unprecedented speed and accuracy, identifying intricate patterns and anomalies that might escape human detection. By deploying advanced machine learning algorithms, companies can create dynamic models that adapt to emerging fraud trends, ensuring a proactive approach rather than a reactive one. This not only minimizes financial losses but also reduces the need for resource-intensive manual reviews. Additionally, AI-driven fraud detection enhances customer trust by swiftly addressing security concerns. By curbing fraud, companies not only protect their bottom line but also contribute to sustainability by fostering a more secure and resilient business environment. It’s a win-win scenario where technology not only safeguards financial interests but aligns with the broader ethos of responsible and enduring business practices.

Conclusion

In conclusion, the integration of AI, especially generative AI, into business operations unveils many opportunities for companies seeking to reduce costs, increase profits, and champion sustainability. From the foundational efficiency of automation to the predictive prowess of analytics, and the personalized touch of generative AI, businesses can strategically utilize these tools for transformative outcomes. Supply chain optimization, predictive maintenance, and content creation further amplify the impact, showcasing the diverse applications of AI.

However, as organizations embark on this AI journey, ethical considerations and environmental consciousness must not be overlooked. Striking a balance between innovation and responsibility is paramount for sustained success. The future belongs to those companies that not only leverage AI for operational excellence but also actively contribute to creating a sustainable and equitable business landscape.

Introduction

Lately, there has been a viral buzz surrounding the term “generative AI.” It’s hard to scroll through social media without bumping into these mind-blowing, AI-generated hyper-realistic images and videos in various genres. These AI creations not only produce captivating visuals but also play a significant role in facilitating business growth, leaving us in awe.

While AI has been an integral part of our lives for quite some time, the current surge in creativity and complexity displayed in these generative creations makes it challenging when delving deeper into its workings.

If you’re an aspiring data analyst, machine learning engineer, or other professional who wishes to understand the basics of AI, this guide is for you. Let’s explore the different evolutions of artificial intelligence and the science behind it in simpler terms, and we’ll also delve into the top service providers of AI and how businesses leverage them in today’s landscape.

What is Artificial Intelligence?

Artificial Intelligence refers to the capability of machines to imitate human intelligence. This isn’t about robots replacing humans; rather, it’s the quest to make machines smart, enabling them to learn, reason, and solve problems autonomously.

AI empowers machines to acquire knowledge, adapt to changes, and independently make decisions. It’s like teaching a computer to think and act like a human.

Machine Learning

AI, or artificial intelligence, involves a crucial element known as machine learning (ML). In simpler terms, machine learning is akin to training computers to improve at tasks without providing detailed instructions. Machines utilize data to learn and enhance their performance without explicit programming. ML, a subset of AI, concentrates on creating algorithms for computers to learn from data. Instead of explicit programming, these systems use statistical techniques to continually improve their performance over time.

Prominent Applications of ML include:

Time Series Forecasting: ML techniques analyze historical time series data to project future values or trends, applicable in domains like sales forecasting, stock market prediction, energy demand forecasting, and weather forecasting.

Credit Scoring: ML models predict creditworthiness based on historical data, enabling lenders to evaluate credit risk and make well-informed decisions regarding loan approvals and interest rates.

Text Classification: ML models categorize text documents into predefined categories or sentiments, with applications such as spam filtering, sentiment analysis, topic classification, and content categorization.

Recommender Systems: ML algorithms are widely utilized in recommender systems to furnish personalized recommendations. These systems learn user preferences from historical data, suggesting relevant products, movies, music, or content.

While scaling a machine learning model to a larger dataset may compromise accuracy, another notable drawback is the manual determination of relevant features by humans, based on business knowledge and statistical analysis. Additionally, ML algorithms face challenges when handling intricate tasks involving high-dimensional data or complex patterns. These limitations spurred the development of Deep Learning (DL) as a distinct branch.

Deep Learning

Taking ML to the next level, Deep Learning (DL) involves artificial neural networks inspired by the human brain, mimicking how our brains work. Employing deep neural networks with multiple layers, DL grasps hierarchical data representations, automating the extraction of relevant features and eliminating the need for manual feature engineering. DL excels at handling complex tasks and large datasets efficiently, achieving remarkable success in areas like computer vision, natural language processing, and speech recognition, despite its complexity and challenges in interpretation.

Common Applications of Deep Learning:

  • Autonomous Vehicles: DL is essential for self-driving cars, using deep neural networks for tasks like object detection, lane detection, and pedestrian tracking, allowing vehicles to understand and react to their surroundings.
  • Facial Recognition: DL is used in training neural networks to detect and identify human faces, enabling applications such as biometric authentication, surveillance systems, and personalized user experiences.
  • Precision Agriculture: Deep learning models analyze data from various sources like satellite imagery and sensors for crop management, disease detection, irrigation scheduling, and yield prediction, leading to more efficient and sustainable farming practices.

However, working with deep learning involves handling large datasets that require constant annotation, a process that can be time-consuming and expensive, particularly when done manually. Additionally, DL models lack interpretability, making it challenging to modify or understand their internal workings. Moreover, there are concerns about their robustness and security in real-world applications due to vulnerabilities exploited by adversarial attacks.

To address these challenges, Generative AI has emerged as a specific area within deep learning.

Generative AI

Now, let’s discuss Generative AI, the latest innovation in the field. Instead of just identifying patterns, generative AI goes a step further by actually producing new content. Unlike just recognizing patterns, generative AI creates things. It aims to produce content that closely resembles what humans might create

A notable example is Generative Adversarial Networks (GANs), which use advanced neural networks to make realistic content such as images, text, and music. Think of it as the creative aspect of AI. A prime example is deepfakes, where AI can generate hyper realistic videos by modifying and combining existing footage. It’s both impressive and a bit eerie.

Generative AI finds applications in various areas:

Image Generation: This involves the model learning from a large set of images and creating new, unique images based on its training data. The tool can generate imaginative images based on prompts like human intelligence.

  • Video Synthesis: Generative models can generate new content by learning from existing videos. This includes tasks like video prediction, where the model creates future frames from a sequence of input frames, and video synthesis, which involves generating entirely new videos. Video synthesis is useful in entertainment, special effects, and video game development.
  • Social Media Content Generation: Generative AI can automate content creation for social media platforms. By training models on extensive social media data, such as images and text, these models can produce engaging and personalized posts, captions, and visuals. The generated content is tailored to specific user preferences and current trends.

In a nutshell, AI is the big brain, Machine Learning is its learning process, Deep Learning is the intricate wiring, and Generative AI is the creative spark.

From spam filters to face recognition and deep fakes, these technologies are shaping our digital world. It’s not just about making things smart; it’s about making them smart in a way that feels almost, well, human.

Top Companies Leveraging AI in their Business:

As AI continues to advance and assert its influence in the business realm, an increasing number of companies are harnessing its capabilities to secure a competitive edge. Below are instances of businesses utilizing AI systems to optimize their operations:

Amazon: The renowned e-commerce retailer uses AI for diverse functions such as product recommendations, warehouse automation, and customer service. Amazon’s AI algorithms scrutinize customer data to furnish personalized product suggestions, while AI-powered robots in its warehouses enhance the efficiency of order fulfillment processes.

Netflix: This streaming service leverages AI to analyze user data and offer personalized content recommendations. By comprehending user preferences and viewing patterns, Netflix personalizes the viewing experience, ultimately boosting user engagement and satisfaction.

IBM: The multinational technology company utilizes its AI platform, Watson, across various sectors for tasks like data analysis, decision-making, and customer service. Watson adeptly analyzes extensive volumes of both structured and unstructured data, enabling businesses to obtain valuable insights and make more informed decisions.

Google: The prominent search engine giant integrates AI for search optimization, language translation, and advertising. Google’s AI algorithms possess the capability to comprehend and process natural language queries, deliver more precise search results, and furnish personalized advertising based on user data.

Conclusion

In conclusion, the rise of generative AI has undeniably captivated our imagination, showcasing its potential not only in creative endeavors but also as a driving force behind business growth.

As we witness the impressive applications of AI in companies like Amazon, Netflix, IBM, and Google, it becomes evident that AI’s transformative influence on various industries is profound.

Looking ahead, the question arises: What might follow generative AI? Could it be interactive AI? As businesses continue to embrace and leverage AI capabilities, the evolution of this technology holds the promise of more interactive and human-like experiences.

SAP Business Objects Data Services delivers a single enterprise-class solution for data integration, data quality, data profiling, and text data processing that allows you to integrate, transform, improve, and deliver trusted data to critical business processes. With SAP BusinessObjects Data Services, IT organizations can maximize operational efficiency with a single solution to improve data quality and gain access to heterogeneous sources and applications.

The important functions of SAP BODS are:

  • Extraction Transformation and Loading (ETL): ETL can extract the data from any database or table and load the data into any other database or table.
  • Data warehousing: Database is designed and developed in a particular format for data analysis and reporting. It can be implemented by using data from databases or data sources.
  • Data Migration: Data migration is the process of moving data from one place to another place. It is a subset of ETL where the data is to be relocated from one software system or from one database to another database.
  • Business Intelligence: It analyses the data of an organization effectively and helps in improving business performance.

Logging into the SAP BODS Designer:

You must have access to a local repository to log into the software. Typically, you create a repository during installation. However, you can create a repository at any time using the Repository Manager and configure access rights within the Central Management Server.

  1. Enter your user credentials for the CMS.
  1. Click Log on. The software attempts to connect to the CMS using the specified information. When you log in successfully, the list of local repositories that are available to you is displayed.
  2. Select the repository you want to use.
  3. Click OK to log in using the selected repository

BODS – Object Hierarchy

Designer window

The Designer user interface consists of a single application window and several embedded supporting windows.

  1. Project area: Contains the current project (and the job(s) and other objects within it) available to you at a given time. In the software, all entities you create, modify, or work with are objects.
  2. Workspace: The area of the application window in which you define, display, and modify objects.
  3. Local object library: Provides access to local repository objects including built-in system objects, such as transforms, and the objects you build and save, such as jobs and data flows.
  4. Tool palette: Buttons on the tool palette enable you to add new objects to the workspace.

Creating Datastores in DS Designer

To develop Data Migration work, you first need to create data stores for the source and the target system.

Step 1:

Click Create Data Stores.

A new window will open.

Step 2:

Enter the Datastore name, Datastore type, and database type as shown below. You can select a different database as the source system as shown in the screenshot below. And also, we need to mention the credentials for that particular database.

Step 3:

Click OK and the Datastore will be added to the Local object library list. If you expand Datastore, it does not show any table.

Data Migration Flow

Step 1:

Create a new project. Click the option, Create Project. Enter the Project Name and click Create. It will be added to the Project Area.

Step 2:

Right click on the Project name and create a new batch job/real time job.

Step 3:

Enter the name of the job and press Enter. You have to add Workflow and Data flow to this. Select a workflow and click the work area to add to the job. Enter the name of the workflow and double-click it to add to the Project area.

Step 4:

In a similar way, select the Data flow and bring it to the Project area. Enter the name of the data flow and double-click to add it under the new project.

Step 5:

Now drag the source table under datastore to the Work area. Now you can drag the target table with a similar data type to the work area or you can create a new template table.

To create a new template table right click the source table, Add New → Template Table. Or we can select it from the tool palette: Click the template table icon and drag it inside a data flow to place the template table in the workspace.

Step 6:

Drag the Query Transform to the workspace. Drag and connect them using the line from the source table to query transform and query transform to the target table. Click the Save All option at the top of the project menu.

Step 7:

Click on the Query transform and map the source Schema In columns that you want to include in the target table by dragging.

QueryTransform:
  • Query Transform is similar to a SQL SELECT statement.
  • It can perform the following operations-
    • Choose (filter) the data to extract from sources
    • Join data from multiple sources
    • Map columns from input to output schemas
    • Perform transformations and functions on the data
    • Add new columns, nested schemas, and function results to the output schema
    • Assign primary keys to output columns
    • Different functions can be performed using the query transform like LOOKUP, AGGREGATE, CONVERSIONS, etc.
Step 8:

Click the Save All option at the top of the project menu. Now you can schedule the job using Data Service Management Console or you can manually execute it by right clicking on the Job name and execute.

Once the job execution is complete, the data is transferred from the source to target databases based on the conditions that we have given in the query transform.

In conclusion, SAP Business Object Data Services (BODS) is a GUI tool which allows you to create and monitor jobs which take data from various types of sources and perform some complex transformation on the data as per the business requirement and then will load the data to a target which again can be of any type (i.e., SAP application, flat file, any database).

CloudIQ Attains New Microsoft Solutions Partner Designations

CloudIQ is proud to announce that we have attained three Microsoft Solutions Partner designations under the new Microsoft Cloud Partner Program – Azure Infrastructure, Data & AI, and Digital & App Innovation. The new partner program replaces Microsoft Silver and Gold competencies with new Solutions Partner Designations.

Microsoft Solutions Partner Designations

For each of the six Solutions Partner Designations – Infrastructure, Data & AI, Digital & App Innovation, Modern Work, Security, and Business Applications under the new Microsoft Cloud Partner Program (MCPP) partners must meet the requirements in three different categories which are Performance, Skilling, and Customer Success.

And we are happy to share that we have attained three of the six solution partner designations.

Microsoft Specializations

On top of the Solution Partner Designations, Microsoft also has advanced specialization programs that help demonstrate advanced technical expertise.

CloudIQ has earned advanced specialization in the Modernization of Web Applications to Azure and Kubernetes on Microsoft Azure.

Attaining the Microsoft Solution Partner Designations and Microsoft Advanced Specializations goes to show CloudIQ’s expertise and commitment to delivering best-in-class solutions for customers in any scenario and every industry.

We leverage our deep industry expertise to help businesses envision new products, create innovative business models, and deliver the next level of customer experiences by leveraging the cloud. From standalone cloud projects to enterprise-wide cloud architecture design you can rely on our cloud engineering expertise.

Get in touch with us to learn more.

Flutter is an open-source UI software development kit created by Google. It is used to develop cross platform applications for Android, iOS, Linux, macOS, Windows, and the web from a single codebase. Flutter apps are written in the Dart language.  Dart compiles to native machine code and hence it is optimized and has high performance.

Flutter is inspired by React Native, but with a few key differences: Flutter supports Hot Reload, which allows developers to make changes to production code without performing an app restart; Flutter uses the same rendering engine as Android, while React Native has historically used its own custom renderer. Flutter uses JavaScript, while React Native uses its own language – both have their strengths and weaknesses. Some may find Flutter easier to learn or be more familiar with, while others may prefer React Native.

Creating a new flutter app

After installing flutter on your machine, you can create a flutter project by using flutter create command.

We can also create the project using IDEs like Visual Studio Code or Android Studio.

The main.dart file in lib folder is where we build our app.

We can run the sample app by on an android emulator created using Android studio.  The sample app displays the number of times we have pressed the + symbol.

Widgets

Flutter has a unique architecture that makes it easy to develop cross-platform mobile apps. Its architecture is built around a widget tree. This means that all the widgets and components are arranged in a tree structure. In Flutter, you can create your own widgets and reuse them in any project.

Material widgets implements the Material design language for iOS, Android, and web.  Cupertino widgets implements the current iOS design language based on Apple’s Human Interface Guidelines.  We mostly use the Material widgets in our code.

Some common widgets are:

1. Scaffold: Implements the basic material design visual layout structure.  will occupy its entire window or device screen.

2. AppBar: AppBar is usually the topmost component of the app, it contains the toolbar and some other common action buttons.

All remaining widgets in a Scaffold other than AppBar are usually defined in the ‘body’ property of the Scaffold.

3. Text: Used to display formatted text in the app.

4. Column: A widget that displays its children in a vertical array.

5. Row: A widget that displays it’s children in the horizontal direction.

The tree of widgets displayed in the sample app is given below:

Stateless and Stateful widgets

The widgets whose state cannot be altered once they are built are called stateless widgets.  Below is the basic structure of a stateless widget. Stateless widget overrides the build () method and returns a widget.

import 'package:flutter/material.dart';
 
class MyApp extends StatelessWidget {
  @override
  Widget build(BuildContext context) {
    return Container();
  }
}

The widgets whose state can be altered once they are built are called stateful Widgets.  Below is the basic structure of a stateful widget. Stateful widget overrides the createState() and returns a State. It is used when the UI can change dynamically.

import 'package:flutter/material.dart';
 
class MyApp extends StatefulWidget {
 
  @override
  // ignore: library_private_types_in_public_api
  _MyAppState createState() => _MyAppState();
}
 
class _MyAppState extends State<MyApp> {
  @override
  Widget build(BuildContext context) {
    return Container();
  }
}

Whenever we change the values of properties denoting the state, we must wrap it by a call to setState function to inform flutter to rebuild the widget and display it.

setState(() {
      _counter++;
    });

Hot Reload

Flutter allows hot reloading during development.  Hot reloading allows us to keep the app running and to inject new versions of the files that we edited at runtime.  This way, we don’t lose any of our state which is especially useful if we are making the UI changes.  For example, in our sample code if we change the primarySwatch to Colors.red, the app color changes from blue to red, but the counter still shows 1 and doesn’t get reset to 0.

In conclusion, Flutter is a framework that is built on the Dart programming language and can be used to create native apps for Android and iOS. Flutter uses a widget-based architecture, and so you’ll be able to create apps that look and feel like the ones that you’ve seen on the Apple and Android stores.

When systems evolve the need to migrate database does arise based on the data we are dealing with. In one of our projects, we had a scenario where non-transactional data had to be accessed frequently with low latency across regions. Azure Cosmos DB is Microsoft’s fast NoSQL database and the first globally distributed database service in the market today to offer comprehensive service level agreements encompassing throughput, latency, availability, and consistency.

So, the choice was clear for us to move the data from PostgreSQL to CosmosDB. In PostgreSQL the data format is flat (i.e., in the form of tables and columns), while CosmosDB supports flexible schemas and hierarchical data, and thus it is well suited for storing catalog data. Provides. JSON format supported by Cosmos DB is an effective format that is very lightweight.

For the data migration from PostgreSQL to CosmosDB we chose Azure Data Factory. Azure Data Factory helps you integrate, perform transformations, and visualize all your data with ease. Azure Data Factory is easy to use, cost-effective, a fully serverless cloud service that accelerates data transformation with code-free data flows.

Data Transformation using Azure Data Factory

Azure Data Factory is an orchestration tool that is used to transform the data from one source to another. It can process and transform the raw data into predictions and insights. It allows you to perform data transformation activities via pipelines. Data flows are created in debug mode to validate the logic of the transformation. The data flow activity is added to the pipeline to execute and test the data flow. And “trigger now” is employed to test the data flow that is in the pipeline.

Several options are available for converting JSON data to flat data, however when it comes to converting flat data into JSON, it is still a challenge. With JSON, inserting null value within the data flow is quite tedious and a unique pipeline must be created to handle null values.

The Azure Data factory has three main options:

  • Author
  • Monitor
  • Manage

The Author option provides the main environment for development. Using this option, we can design and manage Azure Data Factory resources such as the pipelines and dataflows.

The Monitor option allows you to monitor the pipelines; trigger runs; sessions; the time taken for execution and running the pipelines; check if the execution of the pipeline was a success or failure and set up alerts.

The Manage option allows you to manage the connections, link service, source control, triggers, parameters, and security.

Now let’s look at how to migrate from PostgreSQL to CosmosDB and transform the data.

Pipeline and activities creation

The Author option provides the environment for the development of pipelines and data flows. So, the first step is to create a pipeline that contains the data flow activity. The pipeline is a logical grouping of activities that perform a task. By clicking on orchestrate on the home page, we create and name the pipelines.

The activities pane allows you to perform various activities such as move, transform, add data flow (we can use existing data flows or even create new data flows), Azure data explorer, Azure function, Batch Service, Databricks, Data Lake Analytics, etc. Once the data flow is created, we can provide the transformation logic in the data flow canvas. The dataflows distribute the processing of data over different nodes in a Spark cluster to perform the operations parallelly. We need to create a mapping data flow to perform the transformation as well.

We can choose the format of the data as per the requirement. We can also select the dataset, which is simply the view of data or the references of the data that you want to employ in your activity.

In our data migration scenario, say for an activity to copy data from the source (PostgreSQL data directory), the data is taken from the source and put in the blob storage. The basic details are stored in the first copy activity, the patient color details are stored in the second, microchip details are stored in the third, breed details in the fourth and existing patient details are stored in the fifth copy activity respectively. The data flow performs transformations to the data such as join, conditional join, aggregate, pivot, flatten, union, split, etc. And finally, triggers are scheduled for doing transformation in the pipelines.

Linked Service and Integration Runtime

When creating a dataset, first we must create a linked service to link the data store to Data Factory via the management hub. Linked services are much like connection strings and defines the connection to the data source.

In our data migration scenario, we have created the following linked services for Azure Managed Instance, Azure Blob storage, CosmosDB (one for dev and one for UI), and PostgreSQL. To create the linked service, we must provide the server’s name, port, database name, username, password, etc. These are the input for dataflow to connect to the specific database. In this migration, the PostgreSQL data is migrated from a different VM environment into the Azure environment.

Here, the base directory is called containers, inside the containers we have several directories which further contain storage files.

The integration runtime via the management hub is the compute infrastructure for providing data integration capabilities such as data flow, data movement, activity dispatch, and SSIS package execution across different network environments. It provides the linkage between the activity and linked services. The default option is public for the Auto-resolve integration runtime. We can also create private customized ones based on the requirement for dataflow execution.

We hope this walkthrough of migrating data from PostgreSQL to Azure CosmosDB using Azure Data Factory was helpful. If you have queries or want us to help you with your data migration projects, feel free to reach out to us.

Angular is a component-based application design framework for building scalable, reactive, and efficient single-page apps. It provides a wide range of tools and well-integrated libraries to build, develop and test your applications. The angular applications are written by composing HTML templates and components are created to manage these HTML templates. The logic of the applications is given in the services, these services and components are wrapped into the modules.

Angular Evolution

Angular framework was first introduced a decade ago in 2010 under the name Angular JS. Through the course of years, the framework has developed with various updates. AngularJS 1.x is seen as a JavaScript-based framework that creates rich web and mobile applications. The Model-View-Controller (MVC) and its variations Model-View-Presenter (MVP) and Model-View-View-Model (MVVM) have been considered as an Angular JS architecture. Even though Angular JS is a proven architectural pattern and is the standard solution for an application with a view, there are certain disadvantages.

  • Memory leakage.
  • Internet Explorer 8.0 doesn’t support Angular JS.
  • AngularJS totally depends on JavaScript.

Angular 2+ was typescript-based free open-source technology used to develop web applications and mobile applications. It is a component-based framework and has a collection of well-integrated libraries. Angular 2+ is very simple and pretty straightforward to use. Angular 2+ is typescript, which allows you to validate the code with ease and show the error at the time of typing. Implementing forms and validation are much simpler and more effective with angular 2+. With recent advancements, Agular’s responsive design has shifted towards a mobile-first approach.

Angular 3 was skipped because the framework was developed in a MonoRepo, which is a single repo, and its router package was already in the third version. To avoid confusion in terms of dependency, Angular third version was skipped.  Angular 4 was released in 2017 which was compatible with both typescript 2.1 and 2.2.  This version improved the speed and performance by a good extent. Angular 5 was released around the end of 2017 and it was comprised of many new features and improvements. It provided an optimizer that removed unnecessary code from the applications.           It also provided an improved compiler and improvements in angular universal for code allocation, and most important of all it supported typescript 2.4. Angular 6 was released in early 2018 which introduced Angular elements and Angular Rendering Engine.  Angular 7 was released later that year with major performance improvements. It also provided a drag and drop module, angular material, and component Dev kit. Angular 8 was released in 2019 with its dependencies updated, improved web worker bundling, and provided a new lazy loading syntax and also includes angular firebase.

Angular 9 was introduced in 2020, it provided new features and updates such as more consistent ng, updated and improved API extractor, dependency injection update,  better speed and performance, AOT build ensures a faster and better performing compiler, supports typescript 3.7.  Angular 10 was released in June 2020 with new updates and features such as a new data range picker, an updated compiler, Optional stricter settings to catch bugs ahead, performance improvements, supports typescript 3.9.

Angular 11 was released in November 2020, major improvements are router performance, automatic inlining of fonts, updated Hot Module Replacement(HMR), provides faster builds and improved performance, supports Typescript 4 and Webpack 5, component test harness with parallel functions and improved performance. Angular 12 was released on May 2021, with major improvements in styling, supports Typescript 4.2 and Webpack 5.3.7,  nullish coalescing for writing better and cleaner code in typescript classes, improved ng, includes protractor, new dev tools, Migrating from legacy i18n message IDs. Angular 13 is also available with impactful changes towards optimizations. Angular 13 no longer has the view engine, with reduced dependency on ngcc, we can hope for faster and improved compilation.  Angular 13 supports typescript 4.4 and provides an improved and modernized Angular Package Format,  improved Angular CLI and it no longer supports IE11.

Why Angular?

With the evolution of Angular over the years, it has become one of the most recommended for businesses and enterprises for various reasons. Angular is used to develop single-page client applications using HTML and TypeScript. Angular offers two-way data binding and it shares the data between Model as well as View. Hence, when data is modified or changed, components get updated automatically in real-time. Code reusability in an angular application is high.

Angular makes a great recommendation to businesses as it provides a framework that can work well with back-end languages and also very well combines the business logic and UI. Angular provides an effective cross-platform development framework that makes the development process easier with reduced cost. Though initially, it is a little complex to learn, it is worth it as it provides high-quality applications.   Angular provides Typescript that facilitates the developers to write clean and neat code, which makes the fixing of the bugs that much easier. The framework is structured around component-based development, which aids in a steady development process with consistent and high reusability of components. This further facilitates in providing better maintainability and productivity.

The evolution of angular has provided various new features and improvements that have significantly increased the speed and optimized the performance. The older versions had larger bundles size which hindered the fast loading of the applications, but with recent improvements such as lazy load modules and Ivy Renderer, we can create lightweight web applications that are faster and better. To overcome productivity issues and to provide a faster development process, features such as dependency injection and angular services are also provided. Angular keeps ever-evolving based on requests from google and the angular community making it one of the ideal frameworks for businesses and enterprises.

 Angular Architecture and it’s core components

The angular architecture contains the following core components

  • Module
  • Meta Data
  • Directives
  • Pipe
  • Service
  • Decorators

Module

Module in Angular refers to a place where you group the components, directives, pipes, and services, which are related to the application. Every Angular app has a root module that has the bootstrap mechanism to launch the app.

Meta Data

Metadata is used to decorate a class so that it can configure the expected behavior of the class. Metadata is attached to TypeScript using the decorator.

Directives

Directives are classes that are used to change the behavior or view of DOM elements in Angular applications. 

Three types of Angular directives are as follows:

  1. Components – directives with a template.
  2. Attribute directives – directives that change the appearance or behaviour of an element, component, or another directive.
  3. Structural directives – directives that change the DOM layout by adding and removing DOM elements.

Pipe

The pipe is asimple way to transform values in an Angular template. Some of the built-in pipes in angular are CurrencyPipe, DatePipe, JsonPipe, LowerCasePipe, UpperCasePipe, PercentPipe, SlicePipe.

Service

 Services are used to access and share the methods and properties with the other components in the entire project. It reduces the function and properties repetitions in the entire project. HTTP requests and responses will be handled in the services.

There are two types of services in angular.

  • Built-in services – There are approximately 30 built-in services in angular.
  • Custom services – In angular, if the user wants to create their own service, they can opt for custom services.

Decorators

Angular decorators are used to storing metadata. It is about a class, method, or property. There are four types of decorators in Angular:

  1. Class Decorators
  2. Property Decorators
  3. Method Decorators
  4. Parameter Decorators

1. Class Decorators

Class Decorators are top-level decorators. It defines the purpose of the class.

2. Property Decorators

Property decorators are used to specific properties within the class. @Input() is the example for the property decorator.

3. Method Decorators

A Method Decorator is used to specify the methods within your class with functionality. @HostListener() is an example of a method decorator.

4. Parameter Decorators

Parameter decorators are used to decorating parameters. It is used in the class constructors. @Inject() is the example for the parameter decorator

Testing 

Angular uses the Jasmine testing framework. It provides multiple functionalities to test. Karma is the task-runner, uses a configuration file to set the start-up, reporters, and testing framework.

Limitations of Angular

Angular though it is one of the popular modern-day frameworks that is employed, it also has a few limitations:

  • Steep learning curve

Though Angular is a great framework, it’s sometimes is quite difficult even for people with experience in HTML, JS, CSS, and for people who are not much used to n-tier architecture. They can find it hard to learn some of the concepts as it has its own set of rules, which might be difficult and uncomfortable for novice learners.

  • Limited SEO options

Though Angular is a great and powerful platform to build single-page applications, limited SEO options and poor accessibility to search engine crawlers are some of the major drawbacks. This makes it difficult to place the website correctly in the list provided by the search engines.

  • Complex and Verbose

Angular has such a complex framework of modules and complex capabilities for integration and customization. Though angular provides an array of online tutorials and documentation, it’s quite uncomfortable to learn in the beginning. A slow and steady pace is recommended to learn the platform and language.

  • Complex directives

Angular has three directives -attribute directive, structural directive, and component directive. The three directives have their own limitations and it’s quite complex for beginners to understand when to use what.

Angular Prerequisites

Angular requires the following prerequisites – NodeJS, Angular CLI, and Text Editor

Supported by Google

Google is offering a Long-Term Support (LTS) for Angular to scale up to enterprise Angular applications development. Netflix, Gmail, YouTube TV, Upwork, and other organizations are also using the Angular framework for application development.

Open-Replay

Open Replay is an open-source tool, developers can integrate this with the angular applications. When the user uses the application, open-replay will track it. So the developers can easily come to know how much the application is user-friendly. It has the network tracker, redux, NgRx action tracker, and profile tracker plugins. It also captures the performance of the application.

With this article we wanted to give you an introduction to Angular, the evolution of angular over the decade, the need for angular in today’s business world, and its core architecture. We covered the core components of the architecture and also the limitations of the framework. Bottom line is that Angular has become one of the best frameworks for developing single page apps.

Azure App Service is a Platform as a Service (PaaS) that is used to build, deploy, and scale enterprise-grade applications such as web apps, mobile apps, logic apps, API apps, and function apps. It supports multiple programming languages and frameworks such as NET, .NET Core, Java, Ruby, Node.js, PHP, Python.

From a developers perspective Azure App Service provides a great platform to develop, deploy and scale applications.  However, when it comes to production environments Infrastructure as code (IaC) comes in handy. Terraform is an open-source IaC tool with a consistent CLI that lets you write infrastructure as code using declarative configuration files and also, manage, plan and apply changes to infrastructure versions to reach the required configuration state.

Terraform is a good choice as it reduces manual human error by codifying the application infrastructure. Terraform manages infrastructure across more than 300 public clouds and it provides a reusable, cost-effective, and consistent environment that solves dependencies and version controls. In this article, we will take you through the process of deploying a web app in Azure App Service using Terraform.

To deploy the web app in Azure App Service using Terraform, here are the steps we need to follow:

  • Create the Resource Group
  • Create App Service plan and deploy web app

Create the Resource Group:

The first step is to create a resource group using the following terraform code. Any resource that is created must be created within the resource group

terraform {    
  required_providers {    
    azurerm = {    
      source = "hashicorp/azurerm"    
    }    
  }    
} 
   
provider "azurerm" {    
  features {}    
}

resource "azurerm_resource_group" "resource_group" {
  name     = "app-service-rg"
  location = "East US"
}

Create App Service plan and deploy web app

The App Service plan defines the capacity and resources to be shared among one or more app services that are assigned to that plan. Azure WebApp must be associated with an App Service Plan as it specifies the computing resources that are required for the web app to function. The following code creates an app service plan.

resource "azurerm_app_service_plan" "app_service_plan" {
  name                = "example-appserviceplan"
  location            = azurerm_resource_group.resource_group.location
  resource_group_name = azurerm_resource_group.resource_group.name

  sku {
    tier = "Standard"
    size = "S1"
  }
}

And add code for creating app service. Finally, the terraform file looks like the below

terraform {    
  required_providers {    
    azurerm = {    
      source = "hashicorp/azurerm"    
    }    
  }    
} 
   
provider "azurerm" {    
  features {}    
}

resource "azurerm_resource_group" "resource_group" {
  name     = "app-service-rg"
  location = "East US"
}

resource "azurerm_app_service_plan" "app_service_plan" {
  name                = "myappservice-plan"
  location            = azurerm_resource_group.resource_group.location
  resource_group_name = azurerm_resource_group.resource_group.name

  sku {
    tier = "Standard"
    size = "S1"
  }
}

resource "azurerm_app_service" "app_service" {
  name                = "mywebapp-453627 "
  location            = azurerm_resource_group.resource_group.location
  resource_group_name = azurerm_resource_group.resource_group.name
  app_service_plan_id = azurerm_app_service_plan.app_service_plan.id

  #(Optional)
  site_config {
dotnet_framework_version = "v4.0"
    scm_type                 = "LocalGit"
  }
  
  #(Optional)
  app_settings = {
    "SOME_KEY" = "some-value"
  }

}

Now, we should run the following command to initiate terraform.

Command: terraform init

To create an execution plan, we should run the terraform plan command

Command: terraform plan -out appservice.tfplan

To apply the plan, run the following command

Command: terraform apply ” appservice.tfplan “

We can verify the app service created in the specified app service plan and resource group by checking in the Azure portal

Hope you found this article useful. Stay tuned for more articles coming up on Azure App Service and Terraform.

With an average cost of a data breach at $3.86 million last year, it’s wise to employ a good backup system. More than 80% of the fortune 500 companies use Microsoft Azure for running their businesses effortlessly as they are simple, ever-evolving, secure and cost effective. So, in this article let’s explore how to backup and restore Azure Managed Disks using Azure Backup Vault.

Azure Backup services allow you to back up your data and recover it from the Microsoft Azure cloud. They backup and store the data in the backup vaults. These backup vaults make certain that the backups are successful by monitoring and tracking the storage containers, they optimize the resources by automating maintenance tasks and they also provide better security and access control to store and recover data.

Data sources that are supported by Azure Backup include

  • Azure Database for PostgreSQL servers,
  • Azure Blobs, and
  • Azure Disks.

Prerequisites for performing disk backup and restore operations

Backup Vault’s managed identity needs the below roles to be assigned to it for performing disk backup and restore operations:

  • Disk Backup Reader role on the Source disk that needs to be backed up.
  • Disk Snapshot Contributor role on the Resource group where backups are created and managed by Azure Backup.
  • Disk Restore Operator role on the Resource group where the disk will be restored by the Azure Backup.

To assign Azure roles, the user must have Microsoft.Authorization/roleAssignments/write permissions, such as User Access Administrator or Owner.

Steps to backup managed disks:

Create a Backup vault

  1. Go to Backup center service in Azure portal. Backup center enables enterprises to govern, monitor, operate, and analyze backups at scale. Jobs performed in last 24 hours are displayed in the Overview tab. Operations such as Scheduled backup, On-demand backup and Restore are listed along with the status of each operation (Failed, In progress or Completed)

2. Select Vault from the Overview tab

3. In Start: Create Vault page, select Backup vault and then Continue

4. In Basics tab,

  • Under PROJECT DETAILS, select the Subscription and Resource group of the vault to be created.
  • Under INSTANCE DETAILS, type in the Backup vault name.
  • Select the region of the backup vault and backup storage redundancy
  • Select Review and create

5. Select Create. The Backup vault will be created.

Create a backup policy

  1. Select Policy from Backup center’s Overview tab

2. In Start: Create Policy page, select Datasource type as Azure Disks and Vault type is prepopulated as Backup vault. Then, select Continue.

3. In the Basics tab, type in the policy name to be created. Select Datasource type as Azure Disk and select Vault as the Backup vault that was just created. Click on Next: Schedule and Retention to go to next tab.

4. In the Schedule and retention tab,

  • Under Backup schedule, select the backup schedule frequency and specify the time when backup must happen.
  • Specify the number of days backup should be retained under Retention settings.

5. After validation, in Review and Create tab, select Create. The Backup policy is created.

Configure backup of an Azure Disk

To backup an azure disk,

  1. Assign Disk Backup Reader role on the Source disk that needs to be backed up to the Backup vault’s managed identity.
  2. Assign Disk Snapshot Contributor role on the Snapshot Resource group to the Backup vault’s managed identity.

a)     Steps to Assign Disk Backup Reader role on the Source Disk

  1. Go to the source disk that we need to configure backup for.
  2. Select access control (IAM) and Add role assignment

3. In Role tab, search for the role Disk Backup Reader and select it

4. In Members tab, select assign access to Managed identity and select members as the Backup vault.

5. Select Review+ assign. Assignment of Disk Backup Reader role to the Backup vault is done.

b)     Steps to Assign Disk Snapshot Contributor role on the Snapshot Resource group

  1. Go to the target Snapshot resource group.
  2. Select access control (IAM) and Add role assignment

3. In Role tab, search for the role Disk Snapshot Contributor and select it

4. In Members tab, select assign access to Managed identity and select members as the Backup vault.

5. Select Review+ assign. Assignment of Disk Snapshot Contributor role to the Backup vault is done.

Steps to Backup an Azure Disk

  1. In Backup center, Select Backup from the Overview tab

2. In Start: Configure Backup, select Datasource type as Azure Disks and Vault type is prepopulated as Backup vault.

3. In Basics tab, select Datasource type as Azure Disks,select the Vault created and then Next.

4. In Backup policy tab, Select the backup policy.

5. In Datasources tab:

a) Click on Add/Edit and select the disks to backup.

b) Click Select after selection of disks.

c) Select Snapshot Resource Group, the resource groupwhere snapshots of disks are stored. Once the disk backup is configured, the Snapshot Resource Group that’s assigned to a backup instance cannot be changed.

d) Select Validate. Click Next.

6. In Review and Configure tab, Select Configure Backup. The configuration of backup for the disk is done.

For the validation to be successful, We must assign Disk Backup Reader role on the Source disk that needs to be backed up to the Backup vault’s managed identity and Disk Snapshot Contributor role on the Snapshot Resource group to the Backup vault’s managed identity.

On demand backup of an Azure Disk

  1. In the Backup Vault, go to Backup Instances and select the disk to perform on demand backup

2. Select Backup Now.

3. In Backup vault, go to Backup jobs to view the status of the backup.

Restore an Azure Disk from backup

To restore an azure disk, we need to assign Disk Restore Operator role to Backup vaults managed identity on the Resource group where the disk will be restored by the Azure Backup.

Steps to Assign Disk Restore Operator role on the Target Resource group

  1. Go to the target resource group.
  2. Select access control (IAM) and Add role assignment

3. In Role tab, search for the role Disk Restore Operatorand select it

4. In Members tab, select assign access to Managed identity and select members as the Backup vault.

5. Select Review+ assign. Assignment of Disk Restore Operatorrole to the Backup vault is done.

Steps to Restore an Azure Disk

  1. Go to Backup center -> select Backup vault -> select Restore

2. In Basics tab, Select the Backup instance as the disk that needs to be restored and then Next.

3. In Select Restore Point tab, select the required or latest restore point.

4. In Restore parameters, select Target subscription, and Target resource group. Type in the Restored disk name and select Next:Review and restore.

5. After validation, click on Restore. The restore operation is started.

Note: If validation is unsuccessful, follow the steps to Assign Disk Restore Operator role on the Target Resource group.

6. Restore operation is now completed.

While adopting DevOps practices automates and optimizes processes through technology, it all starts with the culture inside the organization—and the people who play a part in it.

Check out this infographic to learn how DevOps unifies people, process, and technology to bring better products to customers faster. Then imagine how the power of GitHub and Azure can benefit your DevOps team.

Together, Microsoft GitHub and Azure DevOps provides an end-to-end experience for development teams to easily collaborate while building and releasing code to Azure, on-premises or any cloud. Contact us today to learn more.

This infographic offers an in-depth look at how Microsoft business analytics and AI is intelligent, trusted, and flexible. This service produces faster, more accurate insights and predictions. It also offers the most secure, compliant, and scalable system. Finally, it works with what you have.

Would you like to leverage Microsoft Business Analytics and AI for faster, more accurate insights and predictions? At CloudIQ Technologies, we have knowledgeable and professional team ready to help you. Contact us today to learn more.

Seattle [23 Jun 2021] – CloudIQ Technologies Inc today announced it has earned the Kubernetes on Microsoft Azure advanced specialization, a validation of a solution partner’s deep knowledge, extensive experience and proven expertise in deploying and managing production workloads in the cloud using containers and managing hosted Kubernetes environments in Microsoft Azure.

Only partners that meet stringent criteria around customer success and staff skilling, as well as pass a third-party audit of their container-based workload deployment and management practices, are able to earn the Kubernetes on Azure advanced specialization.

With over 75% of global organizations expected to run containerized applications in production by 2022, many are looking for a partner with advanced skills to migrate their existing containerized workloads to the cloud, or assist them in developing cloud-native applications using container technologies, DevOps patterns, and a microservices approach.

“With our deep expertise in cloud-native architecture design, we help clients build and run scalable applications with improved security, faster release cycles, easier management, and lower costs”, said Mr. Prem Kandalu, CEO. “As a partner who has earned the Kubernetes on Microsoft Azure advanced specialization, CloudIQ will pass on the benefits of our continued collaboration with Microsoft to our clients.”

Rodney Clark, Corporate Vice President, Global Partner Solutions, Channel Sales and Channel Chief at Microsoft added, “The Kubernetes on Microsoft Azure advanced specialization highlights the partners who can be viewed as most capable when it comes to deploying and managing containerized applications in Azure. CloudIQ Technologies clearly demonstrated that they have both the skills and the experience to deliver best-in-class cloud-native capabilities to customers with Azure.

About CloudIQ Technologies

CloudIQ is a leading cloud consulting and solutions firm that helps businesses envision new products, create innovative business models, and deliver the next level of customer experiences by leveraging the cloud. From standalone cloud projects to enterprise-wide cloud architecture design you can rely on our cloud engineering expertise.

As a Microsoft Gold Partner (Cloud Platform), earner of the Modernization of Web Applications on Microsoft Azure and Kubernetes on Microsoft Azure advanced specializations, Kubernetes Certified Service Provider (KCSP) and Kubernetes Training Partner (KTP), we serve as trusted advisors to Fortune 500 organizations and leverage our deep industry expertise in building cloud-native solutions to help our clients realize the cost, scale and security benefits of the cloud.

Without artificial intelligence (AI), organizing and extracting insights from vast amounts of enterprise data would be a nearly impossible task. Choosing the right AI capabilities is essential to successful initiatives. This infographic presents the four guiding principles behind Microsoft #Azure #AI and why it remains the top choice for today’s leading enterprises.

Would you like to leverage Azure AI for your business,? At CloudIQ Technologies, we have knowledgeable and professional team ready to help you transform massive amounts of raw information into meaningful insights for your business. Contact us today to learn more.

Have you been looking for a fully-managed, secure platform for your web apps? Azure App Service is built to help you build, deploy, and scale your web apps and APIs on your terms. Work with .NET, .NET Core, Node.js, Java, Python or php, in containers or running on Windows or Linux. Check out this infographic and contact CloudIQ Technologies to learn more.

Would you like to modernize your apps using Azure App Service? At CloudIQ Technologies, we have knowledgeable and professional team ready to help you. Contact us today to learn more.

Azure SQL Databases are intelligent and always up to date. It is the only cloud with evergreen SQL which never needs to be patched or updated. This infographic presents the benefits of Azure SQL Database and Azure Advance Threat Protection.

Would you like to migrate SQL Server databases to the Azure cloud,? At CloudIQ Technologies, we have knowledgeable and professional team ready to help address any of your IT infrastructure upgrade needs. Contact us today to learn more.

Moving Windows Server and SQL workloads to Azure provides flexible, scalable, and highly available cloud infrastructure. It also supports rapid innovation and digital transformation, freeing you to focus on your mission. This infographic presents the benefits of running Windows Server and SQL Server on Azure. 

Would you like to move your Windows Server and SQL workloads to Azure? At CloudIQ Technologies, we have knowledgeable and professional team ready to help address any of your IT infrastructure upgrade needs. Contact us today to learn more.

There are four different ways of accessing Azure Data Lake Storage Gen2 in Databricks. However, using the ADLS Gen2 storage account access key directly is the most straightforward option. Before we dive into the actual steps, here is a quick overview of the entire process

  • Understand the features of Azure Data Lake Storage (ADLS)
  • Create ADLS Gen 2 using Azure Portal
  • Use Microsoft Azure Storage Explorer
  • Create Databricks Workspace
  • Integrate ADLS with Databricks
  • Load Data into a Spark DataFrame from the Data Lake
  • Create a Table on Top of the Data in the Data Lake

Microsoft Azure Data Lake Storage (ADLS) is a fully managed, elastic, scalable, and secure file system that supports HDFS semantics and works with the Apache Hadoop ecosystem.  It is built for running large-scale analytics systems that require large computing capacity to process and analyze large amounts of data

Features:

Limitless storage

ADLS is suitable for storing all types of data coming from different sources like devices, applications, and much more. It also allows users to store relational and non-relational data. Additionally, it doesn’t require a schema to be defined before data is loaded into the store. ADLS can store virtually any size of data, and any number of files. Each ADLS file is sliced into blocks and these blocks are distributed across multiple data nodes. There is no limitation on the number of blocks and data nodes.

Auditing

ADLS creates audit logs for all operations performed in it.

Access Control

ADLS provides access control through the support of access control lists (ACL) on files and folders stored in its infrastructure. It also manages authentication through the integration of AAD based on OAuth tokens from supported identity providers.

Create ADLS Gen2 using Portal:

  1. Login into the portal.
  2. Search for “Storage Account”
  3. Click “Add”

4. Choose Subscription and Resource Group.

5. Give storage account name, location, kind, and replication.

6. In the Advanced Tab, set Hierarchical namespace to Enabled

7. Click “Review+Create”

Microsoft Azure Storage Explorer

Microsoft Azure Storage Explorer is a standalone app that makes it easy to work with Azure Storage data on Windows, macOS, and Linux.  Microsoft has also provided this functionality within the Azure portal which is currently in preview mode.1.

  1. Navigate back to your data lake resource in Azure and click ‘Storage Explorer (preview)’.

2. Right-click on ‘CONTAINERS’ and click ‘Create file system’. This will be the root path for our data lake.

3. Name the file system and click ‘OK’.

4. Now, click on the file system you just created and click ‘New Folder’. This is how we will create our base data lake zones. Create folders.

5. To upload data to the data lake, you will need to install Azure Data Lake explorer using the following link.

6. Once you install the program, click ‘Add an account’ in the top left-hand corner, log in with your Azure credentials, keep your subscriptions selected, and click ‘Apply’.

7. Navigate down the tree in the explorer panel on the left-hand side until you get to the file system you created, double click on it. Then navigate into the folder. There you can upload/ download files from your local system.

8. Click “Upload” > “Upload Files”. You can get sample data set from here.

Sample Folder structure:

Create Databricks Workspace

  1. On the Azure home screen, click ‘Create a Resource’

2. In the ‘Search the Marketplace’ search bar, type ‘Databricks’ and you should see ‘Azure Databricks’ pop up as an option. Click that option.

3. Click ‘Create’ to begin creating your workspace.

4. Use the same resource group you created or selected earlier. Then, enter a workspace name.

5. Select ‘Review and Create’.

6. Once the deployment is complete, click ‘Go to resource’ and then click ‘Launch Workspace’ to get into the Databricks workspace.

Integrate ADLS with Databricks:

There are four ways of accessing Azure Data Lake Storage Gen2 in Databricks:

  1. Mount an Azure Data Lake Storage Gen2 filesystem to DBFS using a service principal and OAuth 2.0.
  2. Use a service principal directly.
  3. Use the Azure Data Lake Storage Gen2 storage account access key directly.
  4. Pass your Azure Active Directory credentials, also known as a credential passthrough.

Let’s use option 3.

1. This option is the most straightforward and requires you to run the command, setting the data lake context at the start of every notebook session. Databricks Secrets are used when setting all these configurations

2. To set the data lake context, create a new Python notebook, and paste the following code into the first cell:

spark.conf.set(
"fs.azure.account.key.<storage-account-name>.dfs.core.windows.net",
""
)

3. Replace ‘<storage-account-name>’ with your storage account name.

4. In between the double quotes on the third line, we will be pasting in an access key for the storage account that we grab from Azure

5. Navigate to your storage account in the Azure Portal and click on ‘Access keys’ under ‘Settings’.

6. Click the copy button, and paste the key1 Key in between the double quotes in your cell

7. Attach your notebook to the running cluster and execute the cell. If it worked, you should see the following:

8. If your cluster is shut down, or if you detach the notebook from a cluster, you will have to re-run this cell to access the data.

9. Copy the below command in a new cell, filling in your relevant details, and you should see a list containing the file you updated.

dbutils.fs.ls("abfss://<file-system-name>@<storage-account-
name>.dfs.core.windows.net/<directory-name>")

Load Data into a Spark DataFrame from the Data Lake

Towards the end of the Error! Reference source not found. section, we uploaded a sample CSV file into ADLS.  We will now see how we can read this CSV file from Spark.

We can get the file location from the dbutils.fs.ls command we ran earlier – see the full path as the output.

Run the command given below:

#set the data lake file location:
file_location = "abfss://[email protected]/raw/covid
19/johns-hopkins-covid-19-daily-dashboard-cases-by-states.csv"
 
#read in the data to dataframe df
df = spark.read.format("csv").option("inferSchema", "true").option("header",
"true").option("delimiter",",").load(file_location)
 
#display the dataframe
display(df)

Create a table on top of the data in the data lake

In the previous section, we loaded the data from a CSV file to a DataFrame so that it can be accessed using python spark API.  Now, we will create a Hive table in spark with data in an external location (ADLS), so that the data can be access using SQL instead of python code.

In a new cell, copy the following command:

%sql
CREATE DATABASE covid_researc

Next, create the table pointing to the proper location in the data lake.

%sql
CREATE TABLE IF NOT EXISTS covid_research.covid_data
USING CSV
LOCATION 'abfss://[email protected]/raw/covid
19/johns-hopkins-covid-19-daily-dashboard-cases-by-states.csv'

 You should see the table appear in the data tab on the left-hand navigation pane.

Run a select statement against the table.

%sql
CREATE TABLE IF NOT EXISTS covid_research.covid_data
USING CSV
LOCATION 'abfss://[email protected]/raw/covid1
9/johns-hopkins-covid-19-daily-dashboard-cases-by-states.csv'
OPTIONS (header "true", inferSchema "true")

That concludes our step-by-step guide on accessing Azure Data Lake Storage Gen2 in Databricks, using the ADLS Gen2 storage account access key directly.

Hope you found this guide useful, stay tuned for more.

The scalability, flexibility, cost-efficiency, and improved performance that comes with moving to the cloud is becoming too attractive for companies to ignore them. Many organizations have started moving to the cloud as a cost-effective option to manage their IT portfolio and avoid expensive Capex for the purchase of new servers and remove the complexities in managing on-premises architecture.

Irrespective of a company’s size, migrating to the cloud is certainly quite an undertaking. Fortunately, Microsoft has created a unique platform with a range of tools to help make the migration fast and smooth, while minimizing the risk and impact to your business.

Microsoft Azure is one of the leading cloud computing service providers that allows businesses to use cloud resources on a pay per use model, therefore you can pay for only what you need and how long you need it. Azure provides multiple options right from Infrastructure as a Service (IaaS) to Platform as a Service (PaaS) to Software as a Service (SaaS), so you can choose from a simple lift and shift approach to a more complex application modernization approach.

The migration journey begins with an Assessment of your current setup. There are tools like Azure Migrate that Azure provides for this purpose and additionally you can leverage other assessment tools from Azure migration partner ecosystem. Azure Migrate provides a centralized hub to assess and migrate to Azure on-premise servers, infrastructures, applications, and data. With Azure, you can also assess how your workloads will perform, plan, and implement your migration strategy accordingly.

5 Azure Migration Strategies

Here are five strategies that are adopted widely for migrating an application to Azure cloud.

1. Rehosting

Commonly known as “lift and shift”, this is an approach to migrate applications from an on-premise environment to the cloud with no changes to the underlying applications. This is the most popular migration approach as it allows quick migration with little risk of disruption by employing real-time replication during the transition process.

2. Refactoring

This is also known as “repacking”. It involves making small changes to the code and configuration of the application to ensure they are more compatible with the cloud so you can connect them easily to Azure-native infrastructure. This can improve the scalability and maximize the operational cost-efficiency of the platform.

3. Rearchitecting

Also known as “redesigning”, this strategy involves modifying or extending the code base of an application to optimize it to run on Azure. Rearchitecting is a time-consuming migration approach, but still, it offers infinite scalability.

4. Rebuilding

This strategy involves discarding the old application and rebuilding an application or workload from the ground up using the Azure Platform as a Service (PaaS). In this migration strategy, you manage the applications and services you develop, while Azure manages the platform and infrastructure required to run it.

5. Replacing

Under this approach, all the underlying infrastructure, middleware, application software, and application data are in the cloud and managed by Azure in Microsoft datacenters. This is used for greater efficiency and scalability.

3-Step Migration Process

Once you decide on your migration approach, the actual migration to the cloud is a 3-step process (Assess – Migrate – Optimize). Now before you get started with the migration there are a few preliminary considerations to ensure your cloud environment is ready to receive your workloads. You need to ensure that your virtual data center in the cloud contains the elements that are comparable to your on-premises environment. Building the virtual data center in the cloud is a streamlined process and it includes the following

1. Identity

To ensure authenticated access to users between your on-premises environment and workloads that you have migrated to the cloud, you need to invest in a built-in identity management solution. For this purpose, you can use the Azure Active Directory (Azure AD) or other similar solutions.

2. Storage

Migrating to the cloud requires a storage platform that meets the performance needs of your migrated workloads. You can choose from different storage types and configure exact storage requirements based on workloads to ensure security and reliability. You just need to enter a few details to get the right storage for your migration project.

3. Networking

Networking is the backbone of the data center. When migrating to the cloud, you need to keep the applications in the same subnets and IP address ranges to ensure a seamless migration.  You can create a virtual network to maintain the same performance and stability you had in the on-premise data center.

4. Connectivity

During migration, you’ll transfer a large amount of data to the cloud. So it would be wise to opt for a dedicated connectivity option to help with smooth data transfer and have the best user experience. For this purpose, you can use Azure ExpressRoute as it helps in a faster, private connection to Azure and ensures performance and security.

Now it’s time to begin your migration journey to the cloud.

Migration Phase 1: Assessment

Now that you have a better understanding of Azure and how it fits into your migration strategy, it’s time to assess your existing infrastructure. Here are four steps to do that

1. Identification of application and server dependencies

Begin with inventory and assessment of on-premises IT resources to identify opportunities to optimize the IT environment and prioritize which applications and workloads are ideal for migration. Determining your priorities and objectives early can help you have a seamless migration process.

2. Assessment of on-premises applications and servers

Your organization may run hundreds or thousands of servers and virtual machines. You need consolidated planning and a perfect tool to shift them to the cloud. Microsoft offers Azure Migrate service to provide automation for the assessment of on-premises workloads. Ultimately, the goal of this assessment phase is to collect server and application information, including configuration and usage.

3. Configuration analysis

Configuration analysis will help you understand which of your workloads can be migrated with no modifications, which ones require a few modifications, and which workloads are incompatible with the current installation. Essentially this step helps you ensure the proper functioning of the workloads on the cloud.

4. Cost planning

The final step of the assessment phase is to collect resource usage such as CPU, memory, and storage to forecast costs and expenditures. This helps in ascertaining the actual usage of your workload and ensure that your choice meets both performance and economic targets.

Migration Phase 2: Execute Migration

After you’ve completed discovery and assessment, now it’s time to prepare for the next step – migration.  The lift-and-shift method most often employed for server or VM migration is real-time replication, due to its flexibility and capability in staged migration.

1. Real-time Replication

This involves creating a copy of the workload in the cloud and allowing asynchronous replication to keep the copy and the workload in sync. Replication also lets groups of virtual machines be connected to the cloud. Real-time replication also allows the old workload to remain online and accessible during the migration to ensure zero disruptions.

2. Testing

Once the replication is complete, start your application or workloads using an isolated environment that mimics the cloud production environment. It lets you test the application without impacting the on-premise as well as cloud production systems. When you’re fully satisfied, it’s time to perform the final migration.

Migration Phase 3: Optimize

Once the migration phase is complete, you need to ensure a seamless transition of operating workloads in the cloud. This is what the optimize phase is all about.

1. Secure cloud resources

Know the security controls and the capabilities of the new cloud-based application, to ensure that the security measures are working, and responding properly. You should become familiar with the capabilities of the Azure Security Center like centralized policy management, continuous security assessment, actionable recommendations, and more.

2. Protecting Data

Ensure that the workloads and data are having a proper backup, disaster recovery, encryption, and other measures to protect your business from risks. Azure offers multiple mechanisms like Azure disk encryption, Azure Backup, and Azure Site Recovery to protect your data.

3. Monitoring Cloud Health

Azure offers many monitoring services to ensure you have full visibility into your current system status and get unique insights into your applications and infrastructure. The basic monitoring services include Azure Monitor, Service Health, and Azure Advisor. A few of the premium monitoring services include Application Insights, Azure Log Analytics, and Network Watcher.

There are many options and reasons for migrating workloads to Azure.

With this straightforward guide, migration to Azure wouldn’t be a complex task anymore. By having a proper plan and mapping out the key objectives, you can ensure a successful Azure migration.

Want to know the key benefits of using Windows Server and SQL Server with Microsoft Azure? Here are four of them right from reducing costs and streamlining IT resources; modernizing by migrating to a flexible, open cloud; innovating new apps or managing existing server apps with unlimited flexibility; and ensuring data protection, security, and business continuity.

Would you like to modernize digital processes to improve profitability and ensure data security? At CloudIQ Technologies, we have knowledgeable and professional team ready to help you. Contact us today to learn more.

Azure Databricks provides comprehensive end-to-end diagnostic logs of activities performed by Azure Databricks users, allowing your enterprise to monitor detailed Azure Databricks usage patterns.

In this article, we’re going to look at sending the logs of Azure Databricks workspace to log analytics workspace using diagnostics settings present in the Databricks workspace.

Here are the pre-requisites and steps to enable diagnostics setting for Azure Databricks

Pre-requisites:
  1. User with owner or contributor access where the Databricks workspace is deployed.
  2. Databricks workspace. The diagnostics logging for Azure Databricks service is available only for the Premium plan.
Steps:
  1. Login to Azure portal.
  2. Select the Databricks workspace.

3. Select the diagnostics settings.

4. Now click “+ Add Diagnostics Settings”.

5. Azure Databricks provides diagnostic logs for the following services:

  • DBFS
  • Clusters
  • Pools
  • Accounts
  • Jobs
  • Notebook
  • SSH
  • Workspace
  • Secrets
  • SQL Permissions

6. Here we are going to send the logs to the log analytics workspace.

7. Select all the logs you want and send them to log analytics. Here we’re sending cluster logs.

8. Click Save.

9. Allow some time to ingest the logs to log analytics workspace.

10. Now go to the log analytics workspace where the diagnostics are configured.

11. Select logs. Now using KQL we can query our data sent from the Databricks workspace.

12. The Databricks log tables are found under the LogManagement category.

Databricks Monitoring Dashboard

Here is the simple Databricks Monitoring dashboard we created for

  • Cluster availability
  • Failed job trend
  • Success vs failed job trend

We hope this article helps you set up the right configurations to sending the logs of Azure Databricks workspace to log analytics workspace and build the Databricks monitoring dashboard.

Did you know migrating to Microsoft Azure can reduce your data center footprint 73%? 

Take advantage of your current investments and IT skills in Microsoft technologies. Microsoft applications and Azure have been built to work better together with flexibility, high compatibility and hybrid capabilities. 

Check out this infographic to learn, how you can get unparalleled cost savings, easily plan migrations, avoid complexity of multi-vendor support, and modernize your applications in the cloud from the leader you already trust.

Migrate to Azure at your own pace with confidence and support from CloudIQ. At CloudIQ Technologies, we have knowledgeable and professional team ready to help you modernize workloads with Azure. Contact us today to learn more.

test beta

Break down the cloud journey with four stages of the process—starting with a pre-migration assessment and then looking at migration, post-migration, and optimization. Microsoft Azure has you covered with tools created specifically for you.

Would you like to modernize your apps and data on Azure? At CloudIQ Technologies, we have knowledgeable and professional team ready to help you. Contact us today to learn more.

It’s time to create and implement business and technology strategies powered by the cloud. Here is your complete game plan. Check the “Cloud Adoption Framework” infographic to plan your strategy to modernize and innovate. 

Would you like to migrate to the cloud? At CloudIQ Technologies, we have knowledgeable and professional team ready to help you successfully adopt Cloud. Contact us today to learn more.

In our previous blog on getting started with Azure Databricks, we looked at Databricks tables.  In this blog, we will look at a type of Databricks table called Delta table and best practices around storing data in Delta tables.

1. Delta Lake

Delta Lake is an open-source storage layer that brings reliability to data lakes. Delta Lake provides ACID transactions, scalable metadata handling, and unifies streaming and batch data processing. Delta Lake runs on top of your existing data lake and is fully compatible with Apache Spark APIs. Databricks Delta table is a table that has a Delta Lake as the data source similar to how we had a CSV file as a data source for the table in the previous blog.

2. Table which is not partitioned

When we create a delta table and insert records into it, Databricks loads the data into multiple small files.  You can see the multiple files created for the table “business.inventory” below

3. Partitioned table

Partitioning involves putting different rows into different tables.  E.g., if we have an address table with addresses in the US, the addresses might be stored in 50 different tables corresponding to the 50 states in the US.  A view with a union might be created over all of them to provide a complete view of all addresses.

Sample code to create a table partitioned by date column is given below:

CREATE TABLE events (
  date DATE,
  eventId STRING,
  eventType STRING,
  data STRING)
USING delta
PARTITIONED BY (date) 

The table “business.sales” given below is partitioned by InvoiceDate.  You can see that there is a folder created for each InvoiceDate and within the folders, there are multiple files that store the data for this table.

This partitioning will be useful when we have queries selecting records from this table with InvoiveDate in WHERE clause. 

E.g.:
SELECT SLSDA_ID, RcdType, DistId
FROM business.sales
WHERE InvoiceDate = ‘2013-01-01’

In total there are 40,545 files for this table which you can see from below screenshot

4. OPTIMIZE

SMALL FILE PROBLEM

Historical and new data is often written in very small files and directories.  This data may be spread across a data center or even across the world (that is, not co-located).  The result is that a query on this data may be very slow due to

  • network latency
  • volume of file metadata

The solution is to compact many small files into one larger file.

OPTIMIZE command invokes the bin-packing (Compaction) algorithm to coalesce small files into larger ones.  Small files are compacted together into new larger files up to 1GB.

You can see below that the OPTIMIZE command has removed the 40,545 files and instead of them added 2378 files.  Also, observe that after Optimization the size of the table has decreased from 1.49 GB to 1.08 GB

5. Optimize table which is not partitioned

Optimize will compact the small files for tables that are not partitioned too.

business.finance_transactions_silver table is not partitioned and is currently having 64 files with total size 858 MB

Running the Optimize command coalesces the 64 files to 1 file

Note that “partitionsOptimized” is 1 in this case.  Previously for the partitioned table “partitionsOptimized was 2509.  OPTIMIZE command coalesces the small files within a partition only.  If the table is not partitioned, the whole table is considered as a single xpartition.

6. ZORDER
  • Data Skipping is a performance optimization that aims at speeding up queries that contain filters (WHERE clauses).
    As new data is inserted into a Databricks Delta table, file-level min/max statistics are collected for all columns (including nested ones) of supported types. Then, when there’s a lookup query against the table, Databricks Delta first consults these statistics to determine which files can safely be skipped.  This is done automatically and no specific commands are required to be run for this.
  • Z-Ordering is a technique to co-locate related information in the same set of files.
    Z-Ordering maps multidimensional data to one dimension while preserving the locality of the data points.

Given a column that you want to perform ZORDER on, say OrderColumn, Delta

  • Takes existing parquet files within a partition.
  • Maps the rows within the parquet files according to OrderColumn using the Z-order curve algorithm.
  • In the case of only one column, the mapping above becomes a linear sort
  • Rewrites the sorted data into new parquet files.

Note: We cannot use the table partition column also as a ZORDER column.

Syntax for ZORDER is

OPTIMIZE tablename
ZORDER BY (OrderColumn) 
7. Best practices

a. PARTITION BY

  • Partition the table by a column which is used in the WHERE clause or ON clause (join).  The most commonly used partition column is the date.
  • Use columns with low cardinality.  If the cardinality of a column will be very high, do not use that column for partitioning. For example, if you partition by a column userId and if there can be 1M distinct user IDs, then that is a bad partitioning strategy.
  • Amount of data in each partition: You can partition by a column if you expect data in that partition to be at least 1 GB.  Partitioning is not required for smaller tables.
  • PARTITION BY is done on a single column only

b. OPTIMIZE

  • OPTIMIZE is required for all tables to which we write data continuously on a daily basis.
  • OPTIMIZE is not required for tables that have static data/reference data which are rarely updated.
  • There is a cost associated with OPTIMIZE (Running Optimize command for sales took 6.64 minutes).  We should run it more often (daily) if we want better end-user query performance.  We should run it less often if we want to optimize costs.

c. ZORDER BY

  • If we expect a column to be commonly used in query predicates and if that column has high cardinality (that is, a large number of distinct values), then use ZORDER BY.
  • We can specify multiple columns for ZORDER BY as a comma-separated list. However, the effectiveness of the locality drops with each additional column.
8. References:

Migrating your IT infrastructure to the cloud has a ton of benefits. Whether you’re looking to improve security and become GDPR compliant, cut your total cost of ownership, or promote teamwork and innovation by integrating AI capabilities, the cloud provides a solution to your IT problems.

Would you like to upgrade your IT infrastructure to the cloud? At CloudIQ Technologies, we have knowledgeable and professional team ready to help address any of your IT infrastructure upgrade needs. Contact us today to learn more.

Backing up on-premises resources to the cloud leverages the power and scale of the cloud to deliver high-availability with no maintenance or monitoring overhead. With Azure Backup service the benefits keep adding up, right from data security to centralized monitoring and management.

Azure Backup service uses the MARS agent to back up files, folders, and system state from on-premises machines and Azure VMs. The backups are stored in a Recovery Services vault.

In this article, we will look at how to back up on-premise files and folders using Microsoft Azure Recovery Services (MARS) agent.

There are two ways to run the MARS agent:

  • Directly on on-premises Windows machines.
  • On Azure VMs that run Windows side by side with the Azure VM backup extension.

Here is the step by step process.

Create a Recovery Services vault

1. Sign in to the Azure portal.
2. On the Recovery Services vaults dashboard, select “Add”.

3. The Recovery Services vault dialog box opens. Provide values of the Name, Subscription, Resource group, and Location.
4. Select “Create”.

Download the MARS agent

1. Download the MARS agent so that you can install it on the machines that you want to back up.
2. In the vault, select “Backup”.
3. Select On-premises for “Where is your workload running?”
4. Select Files and folders for “What do you want to back up?”
5. Select “Prepare Infrastructure”.

5. For “Prepare infrastructure”, under Install Recovery Services agent, download the MARS agent.
6. Select “Already downloaded or using the latest Recovery Services Agent”, and then download the vault credentials.
7. Select “Save”.

Install and register the agent

1. Run the MARSagentinstaller.exe file on the VM.
2. In the “MARS Agent Setup Wizard”, select “Installation Settings”.
3. Choose where to install the agent and choose a location for the cache. Select “Next”.

  • The cache is for storing data snapshots before sending them to recovery services vault.
  • The cache location should have free space equal to at least 5 percent of the size of the data you’ll back up.

4. For Proxy Configuration, specify how the agent that runs on the Windows machine will connect to the internet. Then select “Next”.

5. For Installation, review, and select “Install”.
6. After the agent is installed, select “Proceed to Registration”.

7. In Register Server Wizard > Vault Identification, browse to and select the credentials file. Then select “Next”.

8. On the “Encryption Setting” page, specify a passphrase(user-defined) which is used to encrypt and decrypt backups for the machine.

9. Save the passphrase in a secure location. It is needed while restoring a backup.
10. Select “Finish”.

Create a backup policy

Steps to create a backup policy:

1. Open the MARS agent console.
2. Under “Actions”, select “Schedule Backup”.

3. In the “Schedule Backup Wizard”, select “Getting started” and click “Next”.
4. Under “Select Items to Back up”, select “Add Items”.

5. Select items to back up, and select OK.

6. On the “Select Items to Back Up” page, select “Next”.
7. Specify when to take daily or weekly backups in the “Specify Backup Schedule” page and select “Next”.
8. It is possible to schedule up to three daily backups per day and can run weekly backups too.

9. On the “Select Retention Policy” page, specify how to store copies of your data. And select “Next”.
10. On the Confirmation page, review the information, and then select “Finish”.

11. After the wizard finishes creating the backup schedule, select “Close”.

We hope this step by step guide helps you back up on-premise files and folders using Microsoft Azure Recovery Services (MARS) agent.

Many businesses are struggling to find the talent and capacity to create and manage their machine learning models to actually unlock the insights within their data. Here is an infographic that shows how Microsoft Azure Machine Learning streamlines this process to make modeling accessible to all businesses. 

What’s holding your business back from using AI to turn your data into actionable insights? Azure Machine Learning makes AI more accessible to businesses of all sizes and experience levels by reducing cost and helping you create and manage your models. But you don’t have to go it alone. We can help you assess your business needs and adopt the right AI solution. Contact us today to learn how we can help transform your business with AI.

The distributed nature of cloud applications requires a messaging infrastructure that connects the components and services, ideally in a loosely coupled manner in order to maximize scalability. In this article let’s explore the asynchronous messaging options in Azure.

At an architectural level, a message is a datagram created by an entity (producer), to distribute information so that other entities (consumers) can be aware and act accordingly. The producer and the consumer can communicate directly or optionally through an intermediary entity (message broker).  

Messages can be classified into two main categories. If the producer expects an action from the consumer, that message is a command. If the message informs the consumer that an action has taken place, then the message is an event.

Commands

The producer sends a command with the intent that the consumer(s) will perform an operation within the scope of a business transaction.

A command is a high-value message and must be delivered at least once. If a command is lost, the entire business transaction might fail. Also, a command shouldn’t be processed more than once. Doing so might cause an erroneous transaction. A customer might get duplicate orders or billed twice.

Commands are often used to manage the workflow of a multistep business transaction. Depending on the business logic, the producer may expect the consumer to acknowledge the message and report the results of the operation. Based on that result, the producer may choose an appropriate course of action.

Events

An event is a type of message that a producer raises to announce facts.

The producer (known as the publisher in this context) has no expectations that the events will result in any action.

Interested consumer(s), can subscribe, listen for events, and take actions depending on their consumption scenario. Events can have multiple subscribers or no subscribers at all. Two different subscribers can react to an event with different actions and not be aware of one another.

The producer and consumer are loosely coupled and managed independently. The consumer isn’t expected to acknowledge the event back to the producer. A consumer that is no longer interested in the events, can unsubscribe. The consumer is removed from the pipeline without affecting the producer or the overall functionality of the system.

There are two categories of events:

  • The producer raises events to announce discrete facts. A common use case is event notification. For example, Azure Resource Manager raises events when it creates, modifies, or deletes resources. A subscriber of those events could be a Logic App that sends alert emails.
  • The producer raises related events in a sequence, or a stream of events, over a period of time. Typically, a stream is consumed for statistical evaluation. The evaluation can be done within a temporal window or as events arrive. Telemetry is a common use case, for example, health and load monitoring of a system. Another case is event streaming from IoT devices.

A common pattern for implementing event messaging is the Publisher-Subscriber pattern.

Role and benefits of a message broker

An intermediate message broker provides the functionality of moving messages from producer to consumer and can offer additional benefits.

Decoupling

A message broker decouples the producer from the consumer in the logic that generates and uses the messages, respectively. In a complex workflow, the broker can encourage business operations to be decoupled and help coordinate the workflow.

Load balancing

Producers may post a large number of messages that are serviced by many consumers. Use a message broker to distribute processing across servers and improve throughput. Consumers can run on different servers to spread the load. Consumers can be added dynamically to scale out the system when needed or removed otherwise.

Load leveling

The volume of messages generated by the producer or a group of producers can be variable. At times there might be a large volume causing spikes in messages. Instead of adding consumers to handle this work, a message broker can act as a buffer, and consumers gradually drain messages at their own pace without stressing the system.

Reliable messaging

A message broker helps ensure that messages aren’t lost even if communication fails between the producer and consumer. The producer can post messages to the message broker and the consumer can retrieve them when communication is re-established. The producer isn’t blocked unless it loses connectivity with the message broker.

Resilient messaging

A message broker can add resiliency to the consumers in your system. If a consumer fails while processing a message, another instance of the consumer can process that message. The reprocessing is possible because the message persists in the broker.

Technology choices for a message broker

Azure provides several message broker services, each with a range of features.

Azure Service Bus

Azure Service Bus queues are well suited for transferring commands from producers to consumers. Here are some considerations.

Pull model

A consumer of a Service Bus queue constantly polls Service Bus to check if new messages are available. The client SDKs and Azure Functions trigger for Service Bus abstract that model. When a new message is available, the consumer’s callback is invoked, and the message is sent to the consumer.

Guaranteed delivery

Service Bus allows a consumer to peek the queue and lock a message from other consumers.

It’s the responsibility of the consumer to report the processing status of the message. Only when the consumer marks the message as consumed, Service Bus removes the message from the queue. If a failure, timeout, or crash occurs, Service Bus unlocks the message so that other consumers can retrieve it. This way messages aren’t lost in the transfer.

Message ordering

If you want consumers to get the messages in the order they are sent, Service Bus queues guarantee first-in-first-out (FIFO) ordered delivery by using sessions. A session can have one or more messages.

Message persistence

Service bus queues support temporal decoupling. Even when a consumer isn’t available or unable to process the message, it remains in the queue.

Checkpoint long-running transactions

Business transactions can run for a long time. Each operation in the transaction can have multiple messages. Use checkpointing to coordinate the workflow and provide resiliency in case a transaction fails.

Hybrid solution

Service Bus bridges on-premises systems and cloud solutions. On-premises systems are often difficult to reach because of firewall restrictions. Both the producer and consumer (either can be on-premises or the cloud) can use the Service Bus queue endpoint as the pickup and drop off location for messages.

Topics and subscriptions

Service Bus supports the Publisher-Subscriber pattern through Service Bus topics and subscriptions.

Azure Event Grid

Azure Event Grid is recommended for discrete events. Event Grid follows the Publisher-Subscriber pattern. When event sources trigger events, they are published to Event grid topics. Consumers of those events create Event Grid subscriptions by specifying event types and an event handler that will process the events. If there are no subscribers, the events are discarded. Each event can have multiple subscriptions.

Push Model

Event Grid propagates messages to the subscribers in a push model. Suppose you have an event grid subscription with a webhook. When a new event arrives, Event Grid posts the event to the webhook endpoint.

Custom topics

Create custom Event Grid topics, if you want to send events from your application or an Azure service that isn’t integrated with Event Grid.

High throughput

Event Grid can route 10,000,000 events per second per region. The first 100,000 operations per month are free.

Resilient delivery

Even though successful delivery for events isn’t as crucial as commands, you might still want some guarantee depending on the type of event. Event Grid offers features that you can enable and customize, such as retry policies, expiration time, and dead lettering.

Azure Event Hubs

When working with an event stream, Azure Event Hubs is the recommended message broker. Essentially, it’s a large buffer that’s capable of receiving large volumes of data with low latency. The data received can be read quickly through concurrent operations. You can transform the data received by using any real-time analytics provider. Event Hubs also provides the capability to store events in a storage account.

Fast ingestion

Event Hubs are capable of ingesting millions of events per second. The events are only appended to the stream and are ordered by time.

Pull model

Like Event Grid, Event Hubs also offers Publisher-Subscriber capabilities. A key difference between Event Grid and Event Hubs is in the way event data is made available to the subscribers. Event Grid pushes the ingested data to the subscribers whereas Event Hub makes the data available in a pull model. As events are received, Event Hubs appends them to the stream. A subscriber manages its cursor and can move forward and back in the stream, select a time offset, and replay a sequence at its pace.

Partitioning

A partition is a portion of the event stream. The events are divided by using a partition key. For example, several IoT devices send device data to an event hub. The partition key is the device identifier. As events are ingested, Event Hubs move them to separate partitions. Within each partition, all events are ordered by time.

Event Hubs Capture

The Capture feature allows you to store the event stream to Azure Blob storage or Data Lake Storage. This way of storing events is reliable because even if the storage account isn’t available, Capture keeps your data for a period, and then writes to the storage after it’s available.

We hope this quick start guide helps you get stated on azure messaging and event driven architecture.

Azure Databricks lets you spin up clusters and build quickly in a fully managed Apache Spark environment with the global scale and availability of Azure. And of course, for any production-level solution, monitoring is a critical aspect.

Azure Databricks comes with robust monitoring capabilities for custom application metrics, streaming query events, and application log messages. It allows you to push this monitoring data to different logging services.

In this article, we will look at the setup required to send application logs and metrics from Microsoft Azure Databricks to a Log Analytics workspace.

Prerequisites
  1. Clone the repository mentioned below
    https://github.com/mspnp/spark-monitoring.git
  2. Azure Databricks workspace
  3. Azure Databricks CLI
    Databricks workspace personal access token is required to use the CLI
    You can also use the Databricks CLI from Azure Cloud Shell.
  4. Java IDEs with the following resources
    Java Development Kit (JDK) version 1.8
    Scala language SDK 2.11
    Apache Maven 3.5.4
Building the Azure Databricks monitoring library with Docker

After cloning repository please open the terminal in the respective path

Please run the command as follows
Windows :

docker run -it --rm -v %cd%/spark-monitoring:/spark-monitoring -v "%USERPROFILE%/.m2":/root/.m2 maven:3.6.1-jdk-8 /spark-monitoring/build.sh 

Linux:

chmod +x spark-monitoring/build.sh
docker run -it --rm -v `pwd`/spark-monitoring:/spark-monitoring -v "$HOME/.m2":/root/.m2 maven:3.6.1-jdk-8 /spark-monitoring/build.sh 
Configuring Databricks workspace

dbfs configure –token
It will ask for Databricks workspace URL and Token
Use the personal access token that was generated when setting up the prerequisites
You can get the URL from
Azure portal > Databricks service > Overview

 dbfs mkdirs dbfs:/databricks/spark-monitoring 

Open the file /src/spark-listeners/scripts/spark-monitoring.sh
Now add the Log Analytics  Workspace ID and Key

Use Databricks CLI to copy the modified script

dbfs cp <local path to spark-monitoring.sh> dbfs:/databricks/spark-monitoring/spark-monitoring.sh 

Use Databricks CLI to copy all JAR files generated

dbfs cp --overwrite --recursive <local path to target folder> dbfs:/databricks/spark-monitoring/ 
Create and configure the Azure Databricks cluster
  1. Navigate to your Azure Databricks workspace in the Azure Portal.
  2. On the home page, click on “new cluster”.
  3. Choose a name for your cluster and enter it in the text box titled “cluster name”.
  4. In the “Databricks Runtime Version” dropdown, select 5.0 or later (includes Apache Spark 2.4.0, Scala 2.11).

5 Under “Advanced Options”, click on the “Init Scripts” tab. Go to the last line under the
“Init Scripts section” and select “DBFS” under the “destination” dropdown. Enter
“dbfs:/databricks/spark-monitoring/spark-monitoring.sh” in the text box. Click the
“Add” button.

6 Click the “create cluster” button to create the cluster. Next, click on the “start” button to start the cluster.

Now you can run the jobs in the cluster and can get the logs in the Log Analytics workspace

We hope this article helps you set up the right configurations to send application logs and metrics from Azure Databricks to your Log Analytics workspace.

This infographic outlines a day in the life of a remote worker using Microsoft Teams to collaborate, create, and be more productive while working at home. See how this individual uses multiple features to stay connected with the team and work efficiently

The daily life of most workers has changed drastically as COVID19 has made employees move to home offices. But your team can still get work done. Microsoft Teams can make it possible. Contact us to enable MS teams for your organization.

Databricks is a web-based platform for working with Apache Spark, that provides automated cluster management and IPython-style notebooks.  To understand the basics of Apache Spark, refer to our earlier blog on how Apache Spark works

Databricks is currently available on Microsoft Azure and Amazon AWS.  In this blog, we will look at some of the components in Azure Databricks.

1.   Workspace

A Databricks Workspace is an environment for accessing all Databricks assets. The Workspace organizes objects (notebooks, libraries, and experiments) into folders, and provides access to data and computational resources such as clusters and jobs.

Create a Databricks workspace

The first step to using Azure Databricks is to create and deploy a Databricks workspace. You can do this in the Azure portal.

  1. In the Azure portal, select Create a resource > Analytics > Azure Databricks.
  2. Under Azure Databricks Service, provide the values to create a Databricks workspace.

    a. Workspace Name: Provide a name for your workspace.
    b. Subscription: Choose the Azure subscription in which to deploy the workspace.
    c. Resource Group: Choose the Azure resource group to be used.
    d. Location: Select the Azure location near you for deployment.
    e. Pricing Tier: Standard or Premium

Once the Azure Databricks service is created, you will get the screen given below.  Clicking on the Launch Workspace button will open the workspace in a new tab of the browser.

2.   Cluster

A Databricks cluster is a set of computation resources and configurations on which we can run data engineering, data science, and data analytics workloads, such as production ETL pipelines, streaming analytics, ad-hoc analytics, and machine learning.

To create a new cluster:

  1. Select Clusters from the left-hand menu of Databricks’ workspace.
  2. Select Create Cluster to add a new cluster.

We can select the Scala and Spark versions by selecting the appropriate Databricks Runtime Version while creating the cluster.

3.   Notebooks

A notebook is a web-based interface to a document that contains runnable code, visualizations, and narrative text.  We can create a new notebook using either the “Create a Blank Notebook” link in the Workspace (or) by selecting a folder in the workspace and then using the Create >> Notebook menu option.

While creating the notebook, we must select a cluster to which the notebook is to be attached and also select a programming language for the notebook – Python, Scala, SQL, and R are the languages supported in Databricks notebooks.

The workspace menu also provides us the option to import a notebook, by uploading a file (or) specifying a file.  This is helpful if we want to import (Python / Scala) code developed in another IDE (or) if we must import code from an online source control system like git.

In the below notebook we have python code executed in cells Cmd 2 and Cmd 3; a python spark code executed in Cmd 4.  The first cell (Cmd 1) is a Markdown cell.  It displays text which has been formatted using markdown language.

Magic commands

Even though the above notebook was created with Language as python, each cell can have code in a different language using a magic command at the beginning of the cell.  The markdown cell above has the code below where %md is the magic command:

%md Sample Databricks Notebook 

The following provides the list of supported magic commands:

  • %python – Allows us to execute Python code in the cell.
  • %r – Allows us to execute R code in the cell.
  • %scala – Allows us to execute Scala code in the cell.
  • %sql – Allows us to execute SQL statements in the cell.
  • %sh – Allows us to execute Bash Shell commands and code in the cell.
  • %fs – Allows us to execute Databricks Filesystem commands in the cell.
  • %md – Allows us to render Markdown syntax as formatted content in the cell.
  • %run – Allows us to run another notebook from a cell in the current notebook.

4.   Libraries

To make third-party or locally built code available (like .jar files) to notebooks and jobs running on our clusters, we can install a library. Libraries can be written in Python, Java, Scala, and R. We can upload Java, Scala, and Python libraries and point to external packages in PyPI, or Maven.

To install a library on a cluster, select the cluster going through the Clusters option in the left-side menu and then go to the Libraries tab.

Clicking on the “Install New” option provides us with all the options available for installing a library.  We can install the library either uploading it as a Jar file or getting it from a file in DBFS (Data Bricks File System).  We can also instruct Databricks to pull the library from Maven or PyPI repository by providing the coordinates.

5. Jobs

During code development, notebooks are run interactively in the notebook UI.  A job is another way of running a notebook or JAR either immediately or on a scheduled basis.

We can create a job by selecting Jobs from the left-side menu and then provide the name of job, notebook to be run, schedule of the job (daily, hourly, etc.)

Once the jobs are scheduled, the jobs can be monitored using the same Jobs menu.

6.   Databases and tables

A Databricks database is a collection of tables. A Databricks table is a collection of structured data. Tables are equivalent to Apache Spark DataFrames. We can cache, filter, and perform any operations supported by DataFrames on tables. You can query tables with Spark APIs and Spark SQL.

Databricks provides us the option to create new Tables by uploading CSV files; Databricks can even infer the data type of the columns in the CSV file.

All the databases and tables created either by uploading files (or) through Spark programs can be viewed using the Data menu option in Databricks workspace and these tables can be queried using SQL notebooks.

We hope this article helps you getting started with Azure Databricks. You can now spin up clusters and build quickly in a fully managed Apache Spark environment with the global scale and availability of Azure.

To keep business critical applications running 24/7/365 it is important for organizations to have a sound business continuity and disaster recovery strategy. In this article we will discuss how to set up disaster recovery for Azure VM in secondary region.

We will use Azure Site Recovery that helps manage and orchestrate disaster recovery of on-premises machines and Azure virtual machines (VM), including replication, failover, and recovery.

Pre-requisites,
  1. Recovery services vault
  2. A Virtual machine

ENABLING REPLICATION

To replicate a VM to secondary region, prepare site recovery infrastructure. In this case we are replicating from one azure region to another.

For replication from Azure to Azure, one can directly go to recovery service vault and replicate the VM.

  1. Go to recovery services vault >> Replicated Items
  2. Click on the Replicate icon and follow.

3. Select the following

  • Source: Azure
  • Source location: Region where the VM is deployed
  • Azure virtual machine deployment model: Resource manager
  • Source subscription: Subscription where the VM is deployed
  • Source Resource group: Resource group where the VM is deployed

4. Select OK to proceed to next step.

5.Select the VM to replicate

6. Target configurations

  • Target location: Secondary region where the VM is to be replicated
  • Target subscription: Subscription where the VM to be replicated

By default, the following resources are created in target region and can be customized per your need

  • Resource group
  • Virtual network
  • Cache storage account
  • Replica managed disks
  • Target availability sets (if applicable)

You can set replication policies and view extension details here

7.Click on create target resources

8.Then select Enable replication

Go to recovery services vault >> Monitoring >> Site recovery Jobs to view the jobs that are running during the replication.

Look for the following jobs

  1. Prerequisites check for enabling protection
  2. Installing Mobility Service and preparing target
  3. Enable replication
  4. Starting initial replication
  5. Updating the provider states

Once the above-mentioned jobs are over, here is what happens

  • Synchronization process begins
  • Waiting for first recovery point
  • The VM is protected

Once all processes are completed, you can view the VM by going to recovery services vault >> Replicated Items

Introduction to Terraform

Terraform is an open-source tool for managing cloud infrastructure. Terraform uses Infrastructure as Code (IaC) for building, changing and versioning infrastructure safely. Terraform is used to create, manage, and update infrastructure resources such as virtual machines, virtual networks, and clusters.

The Terraform CLI provides a simple mechanism to deploy and version the configuration files to Azure. And with AzureRM you can create, modify and delete azure resources in Terraform configuration.

The infrastructure that Terraform can manage, includes low-level components such as compute instances, storage, and networking, as well as high-level components such as DNS entries, SaaS features, etc.

Providers in Terraform

A provider is responsible for understanding API interactions and exposing resources. Providers generally are an IaaS

  • Azure
  • Aws
  • Google Cloud
  • OpenStack
  • Docker
  • Alibaba Cloud
  • VMware   

For each provider, there are many kinds of resourcesyou can create. Here is the general Syntax for terraform resources.

resource  “<provider>_<type>”   “<name>” 	{
[config]
}

Where PROVIDER is the name of a provider (e.g., Azure), TYPE is the type of resources to create in that provider (e.g., Instance), NAME is an identifier you can use throughout the Terraform code to refer to this resource and CONFIG consists of one or more argumentsthat are specific to that resource.

Terraform Features:

Infrastructure as Code

Infrastructure is described using a high-level configuration syntax. This allows a blueprint of your datacenter to be versioned and treated as you would any other code. Additionally, infrastructure can be shared and re-used.

Execution Plans

Terraform has a “planning” step where it generates an execution plan. The execution plan shows what Terraform will do when you call apply. This lets you avoid any surprises when Terraform manipulates infrastructure.

Resource Graph

Terraform builds a graph of all your resources and parallelizes the creation and modification of any non-dependent resources. Because of this, Terraform builds infrastructure as efficiently as possible, and operators get insight into dependencies in their infrastructure.

Change Automation

Complex changesets can be applied to your infrastructure with minimal human interaction. With the previously mentioned execution plan and resource graph, you know exactly what Terraform will change and in what order, avoiding many possible human errors.

TERRAFORM STRUCTURE

The primary module structure requirement is that a “root module” must exist. The root module is the directory that holds the Terraform configuration files that are applied to build your desired infrastructure. Any module should include, at a minimum, a “main.tf”, a “variables.tf” and “outputs.tf” file.

main.tf calls modules, locals, and data-sources to create all resources. If using nested modules to split up your infrastructure’s required resources, the “main.tf” file holds all your module blocks and any needed resources not contained within your nested modules.

variables.tf contains the input variable and output variable declarations.

outputs.tf tells Terraform what data is important. This data is outputted when “apply” is called and can be queried using the Terraform “output” command. It contains outputs from the resources created in main.tf.

TFVARS File – To persist variable values, create a file and assign variables within this file. Within the current directory, for all files that match terraform.tfvars or *.auto.tfvars, terraform automatically loads them to populate variables.

MODULES

Modules are subdirectories with self-contained Terraform code. A module is a container for multiple resources that are used together. The root module is the directory that holds the Terraform configuration files that are applied to build your desired infrastructure. The root module may call other modules and connect them by passing output values from one as input values of another.

In production, we may need to manage multiple environments, and different products with similar infrastructure. Writing code to manage each of these similar configurations increases redundancy in the code.  And finally, we need the capability to test different versions while keeping the production infrastructure stable.

Terraform provides modules that allow us to abstract away re-usable parts, which can be configured once, and used everywhere. Modules allow us to group resources, define input variables which are used to change resource configuration parameters and define output variables that other resources or modules can use.

Modules can also call other modules using a “module” block, but we recommend keeping the module tree relatively flat and using module composition as an alternative to a deeply nested tree of modules, because this makes the individual modules easier to re-use in different combinations.

Terraform Workflow :

There are steps to build infrastructure with terraform

  • INIT
  • Plan
  • Apply
  • Destroy
INIT

Initialize the Terraform configuration directory using Terraform “init”.

Init will create a hidden directory “.terraform” and download plugins as needed by the configuration. Init also configures the “-backend-config” option and can be used for partial backend configuration.

Command

terraform init -backend-config=”backend-dev.config”

backend-dev.config – This file contains the details shown in the screenshot below.

PLAN

The terraform plan command is used to create an execution plan. The plan will be used to see all the resources that are getting created/updated/deleted, before getting applied. Actual creation will happen in the “apply” command.

The var file given will define resources that are unique for each team.

Command:

terraform plan -var-file="parentvarvalues.tfvars"

This file includes all global variables and Azure subscription details.

APPLY

The Terraform “apply” command is used to apply changes in the configuration. You’ll notice that the “apply” command shows you the same “plan” output and asks you to confirm if you want to proceed with this plan.

The “-auto-approve” parameter will skip the confirmation for creating resources. It’s better not to have it when you want to apply directly, without “plan”.

Command:

terraform apply -var-file="parentvarvalues-team1.tfvars" -auto-approve

Terraform State Management:

Terraform stores the resources it manages into a state file. There are two types of state files: “remote” and “local”. While the “local” state is great for an isolated developer, the “remote” state is quite indispensable for a team, as each member will need to share the infrastructure state whenever there is a change.

Terraform compares those changes with the state file to determine what changes result in a new resource or resource modifications. Terraform stores the state about our managed infrastructure and configuration. This state is used by Terraform to map real-world resources to our configuration, keep track of metadata, and to improve performance for large infrastructures.

TERRAFORM IMPORT

The terraform import command is used to import existing infrastructure. This allows you to take resources you’ve created by some other means and bring it under Terraform management. This is a great way to slowly transition infrastructure to Terraform.

resource “azurerm_resourcegroup .name <subscription_id>{
#instance configuration
}

You want to import the state that already exists, so that the next time you the “apply” command, terraform already knows that the resource exists, and any changes made going forward will be picked up as modifications.

79% of analytics users encounter data questions they can’t solve each month. How are you helping your team to uncover insights from data?

Moving data isn’t always easy. There are more than 340 types of databases in use today and moving data across them presents challenges for any IT team. At CloudIQ Technologies, we have years of experience helping businesses find the IT solutions that can keep up with their constantly evolving business practice. Whether you need a solution for storage, data transfer, or just need to gain better insights from your data, we can help.

KEXP is known internationally for their music and authenticity. To help bring their global audience the music they want, KEXP needed to find a solution that could bring all of their services online. While they eventually accomplished their goal, they did run into roadblocks along the way.

Your partners at CloudIQ Technologies and Microsoft can help you overcome any obstacle. With years of industry experience, you can be rest assured knowing your custom IT solution will be up and running in no time. Contact us today to find out more on how we can help.

Cloud security is a major challenge for organizations running mission critical applications on cloud. One of the biggest risks from hackers come via open ports, and Microsoft Azure Security Center provides a great option to manage this threat – Just-in-Time VM access!

What is Just-in-Time VM access?

With Just-in-Time VM access, you can define what VM and what ports can be opened and controlled and for how long. The Just-in-Time access locks down and limits the ports of Azure virtual machines in order to overcome malicious attacks on the virtual machine, therefore only providing access to a port for a limited amount of time. Basically, you block all inbound traffic at the network level.

When Just-In-Time access is enabled, every user’s request for access will be routed through Azure RBAC, and access will be granted only to users with the right credentials. Once a request is approved, the Security Center automatically configures the NSGs to allow inbound traffic to these ports – only for the requested amount of time, after which it restores the NSGs to their previous states.

The just-in-time option is available only for the standard security center tier and is only applicable for VMs deployed via Azure resource manager.

What are the permissions needed to configure and use JIT?
To enable a user to:Permissions to set
Configure or edit a JIT policy for a VMAssign these actions to the role:
On the scope of a subscription or resource group that
is associated with the VM: Microsoft.Security/locations/jitNetworkAccessPolicies/write
On the scope of a subscription or resource group of VM: Microsoft.Compute/virtualMachines/write
Request JIT access to a VMAssign these actions to the user:
On the scope of a subscription or resource group that is associated with the VM:
Microsoft.Security/locations/jitNetworkAccessPolicies/initiate/action
On the scope of a subscription or resource group that is associated with the VM:
Microsoft.Security/locations/jitNetworkAccessPolicies/*/read
On the scope of a subscription or resource group or VM: Microsoft.Compute/virtualMachines/read
On the scope of a subscription or resource group or VM:
Microsoft.Network/networkInterfaces/*/read

Why Just-in-Time access?

Consider the scenario where a virtual machine is deployed in Azure, and the management port is opened for all IP addresses all the time. This leaves the VM open for brute force attack.

The brute force attack is usually targeted to Management ports like SSH (22) and RDP (3389). If the attacker compromises the security, the whole VM will be open to them. Even though we might have NSG firewalls enabled in our Azure infrastructure, it’s best to limit the exposure of management port within the team for a limited amount of time.

How to enable JIT?

Just-In-Time access can be implemented in two ways,

  1. Go to Azure security center and click on Just-in-Time VM

2. Go to the VM, then click on configuration and “Enable JIT”

How to set up port restrictions?

Go to Azure security center and click on recommendations for Compute &Apps.

Select the VM and click Enable JIT on VMs.

It will then show a list of recommended ports. It is possible to add additional ports as per requirement. The default port list is show below.

Now click on the port that you wish to restrict. A new tab will appear with information on the protocol to be allowed, allowed source IP (per IP address, or a CIDR range).

The main thing to note is the request time. The default time is 3 hours; it can be increased or decreased as per the requirement. Then click, OK.

Click OK and the VM will appear in the Just-in-Time access window in the security center.

What changes will happen in the infrastructure when JIT is enabled?

The Azure security center will create a new Deny rule with a priority less than the original Management port’s Allow rule in the Network security group’s Inbound security rule.

If the VM is behind an Azure firewall, the same rule overwrite occurs in the Azure firewall as well.

How to connect to the JIT enabled VM?

Go to Azure security center and navigate to Just-in-Time access.

Select the VM that you need to access and click on “Request Access”

This will take you to the next page where extra details need to be provided for connectivity such as,

  1. Click ON Toggle
  2. Provide Allowed IP ranges
  3. Select time range
  4. Provide a justification for VM Access
  5. Click on Open Ports

This process will overwrite the NSG Deny rule and create a new Allow rule with less priority than the Deny All inbound rule or the selected port.

The above-mentioned connectivity process includes two things

  1. IP Range
  2. My IP options

IP Range:

In this option, we can provide either a single IP or a CIDR block.

MY IP:

Case 1: If you’re connected to Azure via a public network, i.e., without any IP sec tunnel, while selecting MY IP, the IP address that is to be registered will be the Public IP address of the device you’re connecting from.

Case 2: If you’re connected to Azure via a VPN/ IP Sec tunnel/ VNET Gateway, you can’t possibly use MY IP option. Since the MY IP option directly captures the Public IP and it can’t be used. In this scenario, we need to provide the private IP address of VPN gateway for a single user, or to allow a group of users, provide private IP CIDR Block of the whole organization.

How to monitor who’s requested the access?

The users who request access are registered as Activity. To view the activities in Log analytics workspace, link the Subscription Activity to Log analytics.

It is possible to view the list of users accessed in the log analytics workspace with the help of Kusto Query Language(KQL), once it is configured to send an activity log to log analytics.

In September 2019, Azure announced a brand-new service – Azure Private Link, a very important tool for service providers providing a mix of Azure IaaS and PaaS services.

Azure Private Link enables you to access Azure PaaS Services (for example, Azure Storage and SQL Database) and Azure-hosted customer-owned/partner services over a Private Endpoint in your virtual network. Traffic between your virtual network and the service traverses over the Microsoft backbone network, eliminating exposure from the public Internet. It can be used via a local IP address (on Azure and from on-premises networks) or via a dedicated Azure ExpressRoute network.

Well, naturally, the first benefit is security!  It reduces the exposure of PaaS services to the Internet and provides a secure way to manage traffic between the client’s network and Azure. With Private Link Service, data stays within Microsoft’s system and the client’s private network.

For service providers and their clients, this is obviously critical as it provides secure access to customers in their virtual network while giving them the ability to use the resources in the service provider’s subscription.

Find out how a Private Link Service can be created behind a standard load balancer.

In the example below, Kubernetes Ingress Service is exposed as a Private Link Service. The ingress has a Standard Load Balancer with IP Address 172.17.1.100.

Details of Ingress Service (Internal Load Balancer) 

cloudiq@hubandspoke:~$ kubectl get service -A | grep  LoadBalancer
dev                ciq-demo-ingress-nginx-ingress-
controller        LoadBalancer   192.168.3.11    172.17.1.100     80:32314/TCP,443:30694/TCP   43h

Service can be accessed as below from within the VNET(ciq-demo-vnet)

http://172.17.1.100/web/api/imageresult
Added this method for testing this API in API-MGMT. The current time is : 02/20/2020 10:07:23

The private Link service is created with the following details.

  • Alias – It is a unique URI identifying the service and can be accessed from anywhere within Azure.
  • NAT IP – This determines the Source IP and Destination IP of incoming and outgoing packets to the Private Link service, respectively. This NAT IP can be within any subnet of the service provider VNET

Next, you create a private endpoint in the consumer vnet/subnet. In our example, we have created a network interface in the ciq-devops-general-rq-vnet/default vnet/subnet. The private ip within the vnet/subnet is 10.0.0.4. The Kubernetes ingress service can be accessed from the consumer vnet using the 10.0.0.4 private IP.

cloudiq@cloudiq-build-agent-vm:~$ curl http://10.0.0.4/web/api/imageresult
Added this method for testing this API in API-MGMT. The current time is : 02/20/2020 10:09:03

Private Link can be enabled for other Azure Resources, such as below.

For example, the private endpoint was enabled for a Storage account.

cloudiq@cloudiq-build-agent-vm:~$ curl http://k8sworkshopstg.blob.core.windows.net/test/hw.txt

Hello World!

cloudiq@cloudiq-build-agent-vm:~$ nslookup k8sworkshopstg.blob.core.windows.net
Server:         127.0.0.53
Address:        127.0.0.53#53

Non-authoritative answer:
k8sworkshopstg.blob.core.windows.net    canonical name = k8sworkshopstg.privatelink.blob.core.windows.net.
Name:   k8sworkshopstg.privatelink.blob.core.windows.net
Address: 10.0.0.5

cloudiq@cloudiq-build-agent-vm:~$ curl http://k8sworkshopstg.privatelink.blob.core.windows.net/test/hw.txt

Hello World!

Welcome to Cloud View!

This week we look at why 2020 will be the launch of the ‘Data Decade’, top CX trends, general availability of Azure Sphere, and troubleshooting common problems in Kubernetes Deployments.

Industry News & Perspectives

Launch of the “Data Decade”

CRN asked nearly 80 CEOs five questions about how digital technologies will shape up in 2020 and beyond. Here is a summary of what they said.

Customer experience is no longer just the responsibility of client-facing departments. As more customers shop online, CX has become a top priority for CIOs as well. Here are 5 technology trends that should be a part of every CIO’s strategy.


Technology Updates

Azure Sphere

From its inception in Microsoft Research to general availability today, Azure Sphere is Microsoft’s answer to these escalating IoT threats. An interview with Galen Hunt, distinguished engineer and product leader of Azure Sphere.


DevOps and Agility

As DevOps becomes mainstream and with a range of frameworks to choose from, is DevOps losing its agility? Maybe, maybe not! Here is an article that will reconnect you to the grounding principle of DevOps – innovation, and agility.


Microsoft Datacenters in Spain

Microsoft announces its strategy for establishing new European datacenters in Spain. While there is no fixed launch date, the company has announced that the proposed DCs will deliver Azure, Microsoft 365, Dynamics 365, and the Power Platform.


From CloudIQ

Optimizing Azure Cosmos DB Performance

Azure Cosmos DB allows Azure platform users to elastically and independently scale throughput and storage across any number of Azure regions worldwide. Here is an article on how to optimize Cosmos DB performance.

How to Debug and Troubleshoot Common Problems in Kubernetes Deployments

Kubernetes deployment issues are not always easy to troubleshoot. In some cases, the errors can be resolved easily, and, in some cases, detecting errors requires us to dig deeper and run various commands to identify and resolve the issues. Here is a guided tutorial to debug applications that are deployed into Kubernetes.

Welcome to Cloud View!

This week we look at Microsoft’s Azure strategy, Gartner’s getting smarter about digital business, DevSecOps, debugging Kubernetes applications and a deep dive into Kubernetes networking.

Industry Viewpoints & News

Smarter with Gartner

Kickstart the week with this incredibly ‘smart’ article by Smarter with Gartner. The article sounds a warning about sticking to the old ways of doing digital.

Microsoft’s Azure Strategy

Gavriella Schuster, Microsoft’s Channel Chief, sat down for an illuminating chat with CRN. The conversation revealed Microsoft’s Azure strategy, its channel investment priorities, and its upcoming plans for its partners.

Google Cloud acquires Cornerstone

Google Cloud just finalized another big acquisition with Cornerstone Technology, a mainframe specialist. The new purchase fits in perfectly with Google Cloud’s strategy to make the shifting of legacy applications on to the cloud easier.


Technical Insights 

DevSecOps

When paired together, Security and DevOps can offer organizations more robust and baked-in security. Find out how companies can do DevSecOps correctly in this two-part series by Devops.com.


Azure Firewall Manager

Microsoft extends Azure Firewall Manager preview to include automatic deployment and central security policy management for Azure Firewall in hub virtual networks.


Debugging a Kubernetes Application

General troubleshooting and debugging techniques for an application running in a Kubernetes environment and the most common issues to expect.


From CloudIQ

Kubernetes Networking Deep Dive – Data Plane, how it Works Under the Hood?

Kubernetes is simple enough to get started, however one of the most complex and critical part is the networking. Here is a deep dive into the Data Plane and how it works under the hood.

Microservices – Aligning business and technology for closer collaboration, agility & flexibility

A microservices-based architecture introduces agility, flexibility and supports a sustainable DEVOPS culture ensuring closer collaboration within businesses and the news is that it’s happening for those who embrace it.

Welcome to Cloud View!

This week we look at Oracle Cloud Data Science Platform, GKE support for Windows, DevSecOps and making better quality software using Jenkins CI/CD pipeline.

Industry Perspective and News

Oracle Cloud Data Science Platform

Oracle is pulling out all stops in 2020! The company is in the news once again with the launch of its data science platform – a first of its kind cloud-native platform that is completely geared towards providing a collaborative workspace for data scientists.

Windows on GKE

Google Cloud supports Windows on GKE, as a part its commitment to providing complete support to client’s Windows server-based applications.

Six I’s of Successful IT Leaders

As every company becomes a tech company, the role of the CIOs become critical for success. Naturally, this increases the pressure on the CIOs to deliver more and think strategically. Here is an article that outlines 6 key focus areas that CIOs can start with.


Technological Insights

Storage is getting a reboot!

IBM is the first one to revamp its storage lines with the launch of Storwize and Flash Systems A9000.


DevSecOps

Security as a topic is never far off in any IT discussion. A useful webinar by CEO of Cmd, Jake King, on DevOpsTV. He lays out 7 DevOps-friendly techniques that will help you incorporate security without compromising on speed or scale.


CockroachDB

A fantastic chat with Peter Mattis, the creator of the CockroachDB open-source database and co-founder and CTO of Cockroach Labs. A conversation that covers interesting bits from his career in open source and Google.


From CloudIQ

Make Better Quality Software using Jenkins for your CI/CD Pipeline

With Jenkins, organizations can accelerate the software development process through automation. Jenkins integrates development life-cycle processes of all kinds, including build, document, test, package, stage, deploy, static analysis and much more.

Blue Green Deployment on Azure, Safe Strategy with Zero Downtime

When you are deploying a new change into production, the associated deployment should be in a predictable manner. In simple terms, this means no disruption and zero downtime!

Welcome to Cloud View!

This week we look at the State of DevOps, Hyperledger Fabric on AKS Marketplace template, Serverless computing frameworks and InfluxDB on GCP.

Industry News & Perspectives

Robot Resource Organizations

The adoption of new digital technologies and the ever-changing expectations of customers continues to challenge traditional retailers, forcing them to investigate new-human hybrid operational models, including artificial intelligence (AI), automation and robotics.

Oracle Cloud

While the cloud market pie is divided into 3-4 large slices, there is still plenty of business in the thin sliver left for ‘others’. Oracle Cloud is doing its best to capture this small market and maybe become more than a niche infrastructure provider.

State of DevOps

The 8th annual ‘State of DevOps’ survey reports that the retail sector (as always) is the most advanced when it comes to DevOps adoption. Read on for more details.


Technical Insights

Hyperledger Fabric on AKS Marketplace template

Users with little knowledge of Azure or Hyperledger Fabric will now be able to easily set up a blockchain consortium on Azure with the new Hyperledger Fabric on Azure Kubernetes Service marketplace template.


Serverless computing frameworks

A new report by Datadog shows that almost 50% of the companies using its platform are opting for AWS Lambda serverless computing framework.


InfluxDB on GCP

Database provider InfluxDB is now live on GCP. It announced the availability of its managed cloud service as a part of Google Cloud’s open-source umbrella. Next on the agenda is the rollout of its second-generation serverless offering on Azure.


From CloudIQ

Develop Faster with Continuous Integration & the Tools to Get the Job Done

In recent years CI has become a best practice for software development and is guided by a set of key principles. Among them are revision control, build automation and automated testing.

Installing and Using HELM, the Package Manager for Kubernetes

Helm is a package manager for Kubernetes that allows developers and operators to more easily package, configure, and deploy applications and services onto Kubernetes clusters.

Welcome to Cloud View!

This week we look at Stackshare’s top 140 tools for developers in 2019, auto-labelling tool for AI developers, using Terraform for multi-cloud orchestration strategy and more.

Industry Perspective & News

Launchable aims to increase delivery velocity

This is the best use of any extra 25 minutes you have today. Jenkins founder Kohsuke Kawaguchi (KK) and respected DevOps veteran Harpreet Singh have launched a new company called Launchable that aims to increase delivery velocity. They talk all about it in this podcast (if you prefer to read, then the link has a complete transcript too).  

AWS maintains market share

January ended with the news of AWS posting revenues of $9.95 billion in Q4 2019 – bringing its 2019 revenue up to a grand total of $40bn. But, as we say, “cloud is complex,” and there is plenty of big fish in the cloud market.


Tech Insights

Top 140 tools for developers in 2019

Here is an article every developer MUST bookmark. Stackshareanalyzed over four million data points shared in their community and shortlisted the definitive list of Top 140 tools for developers in 2019!


Auto-labelling tool for AI developers

After that mammoth list, we decided to continue with the tool theme. So here is another one – a new auto-labeling tool for AI developers by IBM.


Cisco HyperFlex Application Platform (HXAP)

Cisco launched a tool (or rather a whole bouquet of tools) that lets customers build their own cloud-native environments. The HyperFlex Application Platform (HXAP) offers a whole host of integrated tools such as container networking, storage, a load balancer, and more.


From CloudIQ

Kubernetes on Azure: A 2-day workshop for AKS developers

Container technology has revolutionized the DevOps landscape and offers organizations the chance to develop and test applications faster and more cost-effectively. CloudIQ’s 2-day hands-on workshop is designed to give DevOps team members the opportunity to skill-up and learn Kubernetes design, deployment, and management.

Terraform for Multi-Cloud Orchestration Strategy

Terraform being cloud-agnostic, allows a single configuration to be used to manage multiple providers, and to even handle cross-cloud dependencies by simplifying management and orchestration.

There are many DevOps lifecycle tools out there, however GitLab is a complete package designed for coordinating CI/CD pipelines.

GitLab is a web-based DevOps lifecycle tool. This application offers functionality to automate the entire DevOps life cycle from planning to creation, build, verify, security testing, deploying, and monitoring, offering high availability and replication. It is highly scalable and can be used on-prem or on the cloud. GitLab also includes a wiki, issue-tracking, and CI/CD pipeline features.

When DevOps projects are spread across large, geographically dispersed teams a complete DevOps tool is highly useful to maintain collaboration, incorporate feedback, avoid mistakes, and speed up the development process.

GitLab goes beyond being just a repository manager; it has a built-in CI/CD, which saves enormous amounts of time and keeps the workflow smooth. Along with its own CI/CD, GitLab also allows for a range of 3rd party integrations with external CI, so you always have the option of working with the tools based on your workflow. 

Here is a quick run-through on how to start with GitLab.

Project:

In GitLab, we can create projects for hosting codebase, use it as an issue tracker, collaborate on code, and continuously build, test, and deploy apps with built-in GitLab CI/CD. Projects can be available publicly, internally, or privately, at our choice. GitLab does not limit the number of private projects we create.

Create a project in GitLab

In the dashboard, click the green “New project” button or use the plus icon in the navigation bar. This opens the New Project page.

On the New Project page :

  • Create a Blank project
  • Fill the name of your project in the Project name field
  • Project URL Field which is the URL path for the project that the GitLab instance will use
  • Project slug field will be auto-populated
  • The Project description (optional) field enables you to enter a description for the project’s dashboard
  • Changing the Visibility Level modifies the project’s viewing and access rights for users
  • Selecting the Initialize repository with a README option creates a README file so that the Git repository is initialized, has a default branch, and can be cloned
  • Click Create project

Repository

A repository is a part of a project, which has a lot of other features.

Host your codebase in GitLab repositories by pushing files to GitLab. You can either use the user interface (UI) or connect your local computer with GitLab through the command line.

GitLab Basic Commands: https://docs.gitlab.com/ee/gitlab-basics/command-line-commands.html

Branch

When you create a new project, GitLab sets the master as the default branch for your project. You can choose another branch to be your project’s default under your project’s Settings > Repository.

Commits

When you commit your changes, you are introducing those changes to your branch. Via a command line, you can commit multiple times before pushing.

A commit message is important to identify what is being changed and, more importantly, why. In GitLab, you can add keywords to the commit message that will perform one of the actions below:

  • Trigger a GitLab CI/CD pipeline: If you have your project configured with GitLab CI/CD, you will trigger a pipeline per push, not per commit.
  • Skip pipelines: You can add to you commit message the keyword [ci skip], and GitLab CI will skip that pipeline.
  • Cross-link issues and merge requests: Cross-linking is great to keep track of what’s somehow related in your workflow. If you mention an issue or a merge request in a commit message, they will be shown on their respective thread.

CI/CD Pipeline

Continuous Integration works by pushing small code chunks to your application’s code base hosted in a Git repository, and, to every push, run a pipeline of scripts to build, test, and validate the code changes before merging them into the main branch.

 Continuous Delivery and Deployment consist of a step further CI, deploying your application to production at every push to the default branch of the repository.

These methodologies allow you to catch bugs and errors early in the development cycle, ensuring that all the code deployed to production complies with the code standards you established for your app.

Two top-level components are:

  1. .gitlab-ci.yml
  2. GitLab Runner
.gitlab-ci.yml

The .gitlab-ci.yml file is where we configure what CI does with the project. It lives in the root of the repository. On any push to the repository, GitLab will look for the .gitlab-ci.yml file and start jobs on Runners according to the contents of the file, for that commit.

Pipeline configuration begins with jobs. Jobs are the most fundamental element of a .gitlab-ci.yml file.

Jobs are:

  • Defined with constraints stating under what conditions they should be executed
  • Top-level elements with an arbitrary name and must contain at least the script clause
  • Not limited in how many can be defined

For example:

job1:
  script: “execute-script-for-job1”

job2:
  script: “execute-script-for-job2”

GitLab Runner

GitLab Runner is a build instance that is used to run jobs and send the results back to GitLab. It is designed to run on the GNU/Linux, macOS, and Windows operating systems.

Runners run the code defined in .gitlab-ci.yml. They are isolated (virtual) machines that pick-up jobs through the coordinator API of GitLab CI. If we want to use Docker, install the latest version. GitLab Runner requires a minimum of Docker v1.13.0.

Types of Runner :
  1. Specific Runner – useful for jobs that have special requirements or for projects with a specific demand.
  2. Shared Runner – useful for jobs that have similar requirements between multiple projects. Rather than having multiple Runners idling for many projects, you can have a single or a small number of Runners that handle multiple projects.
  3. Group Runner – useful when you have multiple projects under one group and would like all projects to have access to a set of Runners. Group Runners process jobs using a FIFO (First In, First Out) queue.
How to use Runner:
  1. Install Runner  –  https://docs.gitlab.com/runner/#install-gitlab-runner
  2. Register Runner – https://docs.gitlab.com/runner/register/index.html

Sample Docker Project:

  1. Create/Upload 2 files. .gitlab-ci.yml and Dockerfile
File Contents:
.gitlab-ci.yml:

# Official docker image.
image: docker:19.03.0-dind

services:
  - docker:19.03.0-dind

before_script:
  - docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" $CI_REGISTRY

build-master:
  stage: build
  script:
    - docker build --pull -t "$CI_REGISTRY_IMAGE" .
    - docker push "$CI_REGISTRY_IMAGE"
  tags:
    - docker

Dockerfile

FROM python:2.7
RUN pip install howdoi
CMD ['howdoi']

Once we create .gitlab-ci.yml file, each push will trigger the pipeline. We didn’t create the Runner yet. So, while creating these files, add “[ci skip]” to commit message. This will skip the CI/CD pipeline.

2. Install Runner

For this example, we are using a specific runner and are going to install a runner in Windows. Refer to this link to Install Runner in Windows: https://docs.gitlab.com/runner/install/windows.html

3. Register Runner:

In order to register a runner, we need a registration token which can be found in Settings > CI/CD > Runner Tab

Check below Specific Runner section. You can copy the token from there.

To register a Runner under Windows run the following command in the path where we install GitLab Runner:
./gitlab-runner.exe register

Enter your GitLab instance URL:
Please enter the gitlab-ci coordinator URL (e.g. https://gitlab.com)
https://gitlab.com

Enter the token you obtained to register the Runner:
Please enter the gitlab-ci token that we copied earlier

Enter a description for the Runner, you can change this later in GitLab’s UI:
Please enter the gitlab-ci description for this Runner
[hostname] my-runner

Enter the tags associated with the Runner, you can change this later in GitLab’s UI:
Please enter the gitlab-ci tags for this Runner (comma separated):
docker

Enter the Runner executor:
Please enter the executor: ssh, docker+machine, docker-ssh+machine, kubernetes, Docker, parallels, virtualbox, docker-ssh, shell:
docker

If you chose Docker as your executor, you’ll be asked for the default image to be used for projects that do not define one in .gitlab-ci.yml:
Please enter the Docker image (eg. ruby:2.6):
python:2.7

Once the Runner is created successfully, it will be displayed under Settings > CI/CD > Runner > Specific Runner section.

4. Now, Go to CI/CD > Pipelines and Click  Run Pipeline Button

It will open a new window. Click Run Pipeline button again.

5. Once the job completed successfully, it will be displayed as below.

Welcome to Cloud View!

This week we look at some of the infrastructure and operations trends and insights on MLOps, Amazon Alexa and Blue green deployment on Azure.

Industry Perspective & News

AWS slashes cost of DR

Celebrate! AWS has announced massive cloud cost reductions on disaster recovery and Kubernetes services

Another week, another bunch of predictions. The difference here is that the article also details some trends that are on the wane.  


Tech Insights

MongoDB Supports GraphQL

MongoDB is now supporting GraphQL language for accessing its serverless application platform. This promises to extend their technology towards web and mobile apps.


CircleCI Orbs

In just a year since its launch, CircleCI Orbs are being used by over 13,000 organizations in close to 9 million CI/CD pipelines. New collaborations with 20 partners now extend the Orbs ecosystem even further.


MLOps

MLOps – the collaborative best practices that accelerate the machine lifecycle across model development, deployment, monitoring, and more – can give organizations a massive edge against competitors. John ‘JG’ Chirapurath, General Manager, Azure Data & AI explains more in this blog.


From CloudIQ

Amazon Alexa Custom Skills – How to Build One Step-by-Step

Amazon’s Alexa is the voice activated and interactive AI Bot designed to respond to number of commands and converse with people. Alexa Skills are apps that give Alexa even more abilities.

Blue Green Deployment on Azure, Safe Strategy with Zero Downtime

In Azure, different processes are available for implementing the Blue-Green strategy with two environments. In this article we discuss some of these techniques.

Cybersecurity is the number one concern for CEOs and is unanimously seen as the biggest threat in the coming years. Reports suggest that the damages from cyberattacks will to amount to $6 trillion annually by 2021.

While a lot of news coverage is given to malicious hackers and ransomware attacks, another crucial area of cyber protection is tightening the internal defenses with intelligent identity management. Keeping a tight control on who can get past your firewalls is vital for maintaining optimum security.

In this article we will review the comprehensive set of security tools available in Azure Cloud.

Azure Active Directory

Multi-Factor Authentication

Azure Multi-Factor Authentication (MFA) helps safeguard access to data and applications while maintaining simplicity for users. It provides additional security by requiring a second form of authentication and delivers strong authentication via a range of easy to use authentication methods. Users may or may not be challenged for MFA based on configuration decisions that an administrator makes.

The security of two-step verification lies in its layered approach. Compromising multiple authentication factors presents a significant challenge for attackers. Even if an attacker manages to learn the user’s password, it is useless without also having possession of the additional authentication method. It works by requiring two or more of the following authentication methods: Something you know (typically a password), Something you have (a trusted device that is not easily duplicated, like a phone), Something you are (biometrics)

Conditional Access policies

Conditional Access is the tool used by Azure Active Directory to bring signals together, to make decisions, and enforce organizational policies. Conditional Access policies at their simplest are if-then statements; if a user wants to access a resource, then they must complete an action. Example: A payroll manager wants to access the payroll application and is required to perform multi-factor authentication to access it.

Azure AD identity protection

Identity Protection is a tool that allows organizations to accomplish three key tasks: Automate the detection and remediation of identity-based risks, Investigate risks using data in the portal, Export risk detection data to third-party utilities for further analysis. The signals generated by and fed to Identity Protection, can be further fed into tools like Conditional Access to make access decisions, or fed back to a security information and event management (SIEM) tool for further investigation based on your organization’s enforced policies.

Azure AD Privileged Identity Management

Privileged Identity Management provides time-based and approval-based role activation to mitigate the risks of excessive, unnecessary, or misused access permissions on resources that you care about. Here are some of the key features of Privileged Identity Management:

  • Provide just-in-time privileged access to Azure AD and Azure resources
  • Assign time-bound access to resources using start and end dates
  • Require approval to activate privileged roles
  • Enforce multi-factor authentication to activate any role
  • Use justification to understand why users activate
  • Get notifications when privileged roles are activated
  • Conduct access reviews to ensure users still need roles
  • Download audit history for internal or external audit

Network Security

Network Security Groups (NSGs)

Network security group security rules are evaluated by priority using the 5-tuple information (source, source port, destination, destination port, and protocol) to allow or deny the traffic. A flow record is created for existing connections. Communication is allowed or denied based on the connection state of the flow record. The flow record allows a network security group to be stateful. If you specify an outbound security rule to any address over port 80, for example, it’s not necessary to specify an inbound security rule for the response to the outbound traffic. You only need to specify an inbound security rule if communication is initiated externally. The opposite is also true. If inbound traffic is allowed over a port, it’s not necessary to specify an outbound security rule to respond to traffic over the port. Existing connections may not be interrupted when you remove a security rule that enabled the flow. Traffic flows are interrupted when connections are stopped, and no traffic is flowing in either direction, for at least a few minutes.

Azure Firewall

With Azure Firewall, you can configure – Application rules that define fully qualified domain names (FQDNs) that can be accessed from a subnet and Network rules that define source address, protocol, destination port, and destination address. Network traffic is subjected to the configured firewall rules when you route your network traffic to the firewall as the subnet default gateway.

Application security groups

Application security groups enable you to configure network security as a natural extension of an application’s structure, allowing you to group virtual machines and define network security policies based on those groups. You can reuse your security policy at scale without manual maintenance of explicit IP addresses. The platform handles the complexity of explicit IP addresses and multiple rule sets, allowing you to focus on your business logic.

Resource management security

Azure resource locks

As an administrator, you may need to lock a subscription, resource group, or resource to prevent other users in your organization from accidentally deleting or modifying critical resources. You can set the lock level to CanNotDelete or ReadOnly. In the portal, the locks are called Delete and Read-only, respectively. CanNotDelete means authorized users can still read and modify a resource, but they can’t delete the resource. ReadOnly means authorized users can read a resource, but they can’t delete or update the resource. Applying this lock is similar to restricting all authorized users to the permissions granted by the Reader role.

Azure policies

Azure Policy is a service in Azure that you use to create, assign, and manage policies. These policies enforce different rules and effects over your resources, so those resources stay compliant with your corporate standards and service level agreements. Azure Policy meets this need by evaluating your resources for non-compliance with assigned policies. All data stored by Azure Policy is encrypted at rest. For example, you can have a policy to allow only a certain SKU size of virtual machines in your environment. Once this policy is implemented, new and existing resources are evaluated for compliance.

Custom RBAC roles

Granting permission using custom Azure AD roles is a two-step process that involves creating a custom role definition and then assigning it using a role assignment. A custom role definition is a collection of permissions that you add from a preset list. These permissions are the same permissions used in the built-in roles.

Once youíve created your role definition, you can assign it to a user by creating a role assignment. A role assignment grants the user the permissions in a role definition at a specified scope. This two-step process allows you to create a single role definition and assign it many times at different scopes. A scope defines the set of Azure AD resources the role member has access to.

Encryption for data at rest

Azure SQL Database Always Encrypted

Always Encrypted is a new data encryption technology in Azure SQL Database and SQL Server that helps protect sensitive data at rest on the server during movement between client and server, and while the data is in use, ensuring that sensitive data never appears as plaintext inside the database system. After you encrypt data, only client applications or app servers that have access to the keys can access plaintext data.

Implement database encryption

Transparent data encryption (TDE) helps protect Azure SQL Database, Azure SQL Managed Instance, and Azure Data Warehouse against the threat of malicious offline activity by encrypting data at rest. It performs real-time encryption and decryption of the database, associated backups, and transaction log files at rest without requiring changes to the application. By default, TDE is enabled for all newly deployed Azure SQL databases.

Implement Storage Service Encryption

Data in Azure Storage is encrypted and decrypted transparently using 256-bit AES encryption, one of the strongest block ciphers available, and is FIPS 140-2 compliant. Azure Storage encryption is similar to BitLocker encryption on Windows.

Azure Storage encryption is enabled for all new storage accounts, including both Resource Manager and classic storage accounts. Azure Storage encryption cannot be disabled. Because your data is secured by default, you don’t need to modify your code or applications to take advantage of Azure Storage encryption.

Implement disk encryption

Azure Disk Encryption helps protect and safeguard your data to meet your organizational security and compliance commitments. It uses the Bitlocker feature of Windows to provide volume encryption for the OS and data disks of Azure virtual machines (VMs), and is integrated with Azure Key Vault to help you control and manage the disk encryption keys and secrets.

Configure application security

Configure SSL/TLS certs

If you purchase an App Service Certificate from Azure, Azure manages the following tasks: Takes care of the purchase process from GoDaddy, Performs domain verification of the certificate, Maintains the certificate in Azure Key Vault, Manages certificate renewal (see Renew certificate), Synchronize the certificate automatically with the imported copies in App Service apps.

Configure and Manage Key Vault

Manage access to Key Vault

Azure Key Vault is a cloud service that safeguards encryption keys and secrets like certificates, connection strings, and passwords. Because this data is sensitive and business-critical, you need to secure access to your key vaults by allowing only authorized applications and users.

Access to a key vault is controlled through two interfaces: the management plane and the data plane. The management plane is where you manage Key Vault itself. Operations in this plane include creating and deleting key vaults, retrieving Key Vault properties, and updating access policies. The data plane is where you work with the data stored in a key vault. You can add, delete, and modify keys, secrets, and certificates.

To access a key vault in either plane, all callers (users or applications) must have proper authentication and authorization. Authentication establishes the identity of the caller. Authorization determines which operations the caller can execute.

Both planes use Azure Active Directory (Azure AD) for authentication. For authorization, the management plane uses role-based access control (RBAC), and the data plane uses a Key Vault access policy.

Welcome to Cloud View!

This week we look at the worldwide IT spending for 2020, aligning IoT and AI with business goals and the Kubernetes bug bounty program.

Tech Insights

IT Spending 2020

Gartner starts the year with some great news – Worldwide IT spending is projected to increase by 3.4% to $3.9 trillion in 2020. Get more insights from the Gartner report here.

Adoption of IoT and AI

To get the full benefits of IoT and AI, their adoption will need to merge with business strategies and goals. And that will change the way businesses are structured.


Industry Insights

Kubernetes-based Red Hat OpenShift 4.3 

Red Hat’s cloud-native commitment stays strong! The company just released its Kubernetes-based Red Hat OpenShift 4.3 and Red Hat OpenShift Container Storage 4 to provide multi-cloud Kubernetes container support.


Kubernetes bug bounty program

Kubernetes Product security committee is launching a new bug bounty program to tap into the power of the highly active Kubernetes community to find vulnerabilities in the software. Find out how you can get started.


From CloudIQ

Introduction to Machine Learning and How It Works

In this article we look at the basics of machine learning, the different algorithm models and a simple machine learning algorithm example using Python.

How to build real-time streaming data pipelines and applications using Apache Kafka?

In this article we will discuss how to use Apache Kafka, the distributed publish-subscribe messaging system and to pass messages from one end-point to another.

Welcome to Cloud View!

This week we look at some of the technologies of the future and insights on container performance & security, connected vehicles, and chatbots.

Latest in Tech

CES 2020

2020 starts with the biggest tech event of the year – CES 2020. The Las Vegas event showcases the technology of the future! Here are the highlights of some of the enterprise technologies on display.

Technology 2020

Want to know the latest technology trends that will impact businesses in 2020? From the empowered edge to human augmentation and more, here are 15 of them.


Industry Insights

Service Mesh

Containers are dominating the software world, but despite their popularity and orchestration software like Kubernetes, they are still challenging to manage. Service meshes come as the answer to improving container performance and security.


Connected Vehicles

The world of vehicle software is heating up! BlackBerry QNX and AWS are targeting automotive OEMs to bring services, personalization, health monitoring, and advanced driver assistance (ADAS) to vehicles.


Chatbots

In the next couple of years, 70% of white-collar workers will chat with conversation AI platforms daily – predicts Gartner. Here is a case study of improving customer service with an intelligent virtual assistant using IBM Watson.


From CloudIQ

Azure Database for MySQL and Grafana to monitor Azure services

More and more organizations run their business-critical applications on containers using Azure and that calls for a more intuitive dashboard to monitor and track Azure Services.

How to Create and Run Spark Clusters with Qubole using AWS

Qubole is a platform that puts big data on the cloud to power business decisions based on real-time analytics. Here is how you can create and run Spark Clusters with Qubole using AWS.

As more and more organizations run their business-critical applications on containers using Azure, there are new challenges in monitoring and managing them. Of course, there is the Azure dashboard, but with elaborate set-ups and such, IT teams feel the need for a more intuitive dashboard to monitor and track Azure services.

The answer, Grafana.

Grafana is an open-source dashboard and graph editor for Graphite, Elasticsearch, OpenTSDB, Prometheus, and InfluxDB. It is a powerful visualization application that deals effectively with large-scale measurement data and time-series data.

As compared to other dashboards, especially the native Azure dashboard, Grafana offers a wider variety of visualization options (graphs, heatmaps, tables, and more) and can collect and collate data from multiple sources. It is designed for evaluating metrics such as system CPU, memory, disk, and I/O utilization.

A Grafana dashboard will help you understand, analyze, monitor, and explore your data with flexible and fast visualization tools.

In this article we will look at using Azure Database for MySQL and Grafana to monitor Azure services

Access Requirements

In your Azure subscription, your account must have “Microsoft.Authorization/*/Write” access to assign and AD app to a role. This action is granted through “Owner” role or “User Access Administrator” role. “Contributor” role will not have the required permissions.

Virtual machine requirements,
  • VM Operating System      :  Linux (ubuntu 18.04)
  • VM Size                                   :  Standard D2s v3 (2 vcpus, 8 GiB memory) is more than enough
  • SSH access                            :  username and password.
  • Default port                       :  3000
  • NSG rule                                 :  open an inbound rule in network security group with
                                                        limited access to port 3000 and 22 for SSH.
  • Assign a static public IP address to the VM

MySQL Creation and Linking to Grafana

1. Create an Azure database for MySQL server from

2. Select the resource group, provide Server name, admin username, password, confirm password. Take a note of the password; it is used several times throughout the set-up.

3. To select compute and storage,

a. There are three pricing tiers, (choose basic)

  • Basic
  • General-purpose
  • Memory-optimized

b. Select the appropriate sizes.

  • Computer generation: Gen 5
  • vCore: 1
  • storage: 5GB
  • Auto-growth:
  • Backup retention period:
  • Local redundant / Geo-Redundant

For basic compute and storage,

The maximum vcore is 2, and Storage is 1024 GB. Choose as per your needs.

4. Then click Review+Create

5. Once the Azure database for MYSQL server is deployed, go to connection security and do the following changes,

  • Add a client IP.
  • Set “Allow access to Azure services” to ON

6. Do the following in the SQL server by connecting to it using the Server admin login name and password in SQL workbench. Create a new query tab. (You can use any tool to connect MySQL)

7. Run the following commands in the query tab,

  • create database named “grafana” ;

8. Now the SQL server-side configurations are over. We need to provide the inputs of SQL server configuration to docker containers running Grafana.

9. Login to the VM running Grafana using appropriate SSH credentials (password or access keys).

10. Note the following values and save them as environment variables as an environment list.

Type                =          mysql
Host                =          <servername>:3306 (mysql server name created earlier)
Name              =          grafana (DB name given in the earlier steps and given access)
User                =          <Server admin login name>
Password       =          <server login password>

11. Save the changes mentioned above as a list. As shown,

Installing Grafana as a docker container and required its plugins

1. Login to the server using appropriate credentials,

2. Get updates using, sudo apt-get update

3. Install docker using the command, sudo apt install docker.io

4. Enable and start docker,

  • sudo systemctl start docker
  • sudo systemctl enable docker

5. Verify the installation using the command,

  • docker –version the result will be like this,

6. Now login as root using the command,

  • sudo su

7. Pull Grafana image; this needs an Internet connection as this will download the image from a public hub of docker

  • docker pull grafana/grafana

8. Run the image with saved environment variables,

  • docker run -d –name=grafana -p 8000:3000 –env-file ./env.list grafana/grafana

9. verify the container installation using “docker ps” command

10. The next step is to install the plugins for Grafana, which will be used in setting up the dashboard. We need to login to the container created previously to install these plugins.

11. Now create a shell inside the container using,

  • docker exec -it grafana /bin/bash

12. The result will be as shown,

13. By default, in Grafana dashboard there’ll be limited number of panel plugins, to use more visualization we can manually install plugins. Now copy the plugin installation commands listed below and run them one by one or everything at once.

  • grafana-cli plugins install michaeldmoore-annunciator-panel
  • grafana-cli plugins install grafana-piechart-panel
  • grafana-cli plugins install farski-blendstat-panel
  • grafana-cli plugins install michaeldmoore-multistat-panel
  • grafana-cli plugins install grafana-polystat-panel
  • grafana-cli plugins install flant-statusmap-panel
  • grafana-cli plugins install grafana-clock-panel
  • grafana-cli plugins install neocat-cal-heatmap-panel
  • grafana-cli plugins install briangann-gauge-panel
  • grafana-cli plugins install natel-plotly-panel

14. Once the plugins are installed,

  • verify the installation by going into, /var/lib/grafana/plugins directory by using the commands listed below
  • cd  /var/lib/grafana/plugins
  • to view installed plugins, use ls command

15. Now exit the container, command: exit

16. Now restart the container using,

  • docker container restart Grafana (here “grafana” refers to the container name created earlier, which can be found using docker ps command)

Linking Azure Monitoring Tools

Service Principal

  • Register an app in Azure Active Directory (AD).
  • Create a client secret in the Registered app.
  • Go to subscription à IAM à search for the app registration and provide “READER” access to the registered app in the Azure AD in first place.

Applying the service principal to Grafana,

  • Go to Grafana UI by using public IP address followed by port number,
    i.e., (IP address):3000 example: 13.25.49.164:3000
  • Now Add data source. And click select
  • This page will appear, input the tenant ID, client ID, client secret.
  • Then provide details for log analytics workspace and Application insights.
  • If the provided details are correct this message should be displayed.

Welcome to Cloud View!

We hope you had a joyful and fun holiday! NOW with the festivities behind us, it’s time for business again!

Let’s start with what IT Leaders are planning for 2020.

Leader’s Speak

CIO’s plan for 2020

What are the CIOs thinking and planning for the coming year? It seems finding talent, dealing with rising security problems, and prioritizing the acquisition of new technologies are some of the topics occupying the C-suite.

IT Industry in 2020

Joel Friedman, CTO, Rackspace offers his take on what 2020 will hold for the IT industry. According to Joel, hybrid, and multi-cloud, SaaS and security problems will grow;

A decade since the launch of Azure

2020 marks a decade since the launch of Azure. Ever wondered what the founders think about their creation? Here’s a short interview with Microsoft’s Yousef Khalidi and Hoi Vo, key members of the original Azure ‘dream team’.


Industry insights

AI Solutions

As a digital-first service provider, we at CloudIQ help implement AI solutions across organizations. Impact like this is what we aim for! This is what the future of AI looks like.


Top Big Data companies to watch for in 2020

Big Data is going to dominate many a boardroom in the coming few years. We start the year by tracking some promising companies that will define the coming year with their next-generation data management, data science, and machine learning technology.


From CloudIQ

Deploying a Pod containing three applications using Jenkins CI/CD pipeline and updating them selectively

Kubernetes pod is a layer of abstraction wrapped around containers to group them together for resource allocation and efficient management. Here is how to deploy a pod containing three applications using Jenkins CI/CD pipeline and update them selectively.

Provisioning Cloud Infrastructure using AWS CloudFormation Templates

Spend less time managing cloud infrastructure and focus on building your application, thanks to AWS CloudFormation templates. Here is a quick start guide to creating the templates for provisioning cloud infrastructure.

A Kubernetes pod – incidentally, some say it is named after a whale pod because the docker logo is a whale – is the foundational unit of execution in a K8s ecosystem. While docker is the most common container runtime, pods are container agnostic and support other containers as well.

Simply put, a K8s pod is a layer of abstraction wrapped around containers to group them together to allocate resources and to manage them efficiently.

Continuous integration and delivery or CI/CD is a critical part of DevOps ecosystems, especially for cloud-native applications. DevOps teams frequently use Jenkins CI/CD pipelines to increase the speed and quality of collaborated software development ecosystems by adding automation. Thanks to Helm, deploying Jenkins server to K8s is quick and easy. The difficult bit is building the pipeline.

Here is a post that describes how to deploy a pod containing three applications using a Jenkins CI/CD pipeline and update them selectively.

Task on Hand:

Use a Jenkins pipeline to build a spring-boot application to generate jar file, dockerize the application to create a docker image, push the image to a docker repository and pull the image into the pod to create containers. The pod should contain 3 containers for the three applications, respectively. Upon git commit, only the container for which there is a change must be updated (rolling update).

Steps
  1. Create a pipeline using groovy script to clone the respective git repo, build the project using maven, build the docker images, push it to dockerhub and pull these images to run containers in the pod.
  2. Repeat the steps for all the three applications in separate stages. Make sure to create a separate directory in each stage to prevent conflicts when using similar files. Also, this clones the different git repos into different folders to avoid confusion.
  3. Here is the Jenkinsfile/Pipeline script to perform the above task:
pipeline {
agent any
stages {
stage('Build1'){
    steps{
        dir('app1'){
            script{
                git 'https://github.com/cloud/simple-spring.git'
                sh 'mvn clean install'
                app = docker.build("cloud007/simple-spring")
                docker.withRegistry( "https://registry.hub.docker.com", "dockerhub" ) {
                // dockerImage.push()
                app.push("latest")
            }
        }
    }
}

}
stage('Build2'){
    steps{
        dir('app2'){
            script{
                git 'https://github.com/cloud/simple-spring-2.git'
                sh 'mvn clean install'
                app = docker.build("cloud007/simple-spring-2")
                docker.withRegistry( "https://registry.hub.docker.com", "dockerhub" ) {
                // dockerImage.push()
                app.push("latest")
            }
        }
    }
}

}
stage('Build3'){
    steps{
        dir('app3'){
            script{
                git 'https://github.com/cloud/simple-spring-3.git'
                sh 'mvn clean install'
                app = docker.build("cloud007/simple-spring-3")
                docker.withRegistry( "https://registry.hub.docker.com", "dockerhub" ) {
                // dockerImage.push()
                app.push("latest")
            }
        }
    }
}
}
stage('Orchestrate')
{
    steps{
        script{
    sh 'kubectl apply -f demo.yaml'
        }
    }
}

}
}

4. Make sure to properly configure docker and expose the dockerd in port 4243 and then change permission to allow Jenkins to use docker commands by changing permission for the /var/run/docker.sock shown.

5. Coming to integrating Kubernetes with Jenkins, it can be done using two plugins:

  • Kubernetes plugin: When using this plugin, we configure the credentials to use our local cluster/azure cluster and specify the container templates for the containers to be created in the pipeline. But since all the tasks must run in containers, it is a little confusing approach. A better approach would be to use the kuberentes-cli plugin.

Refer: https://github.com/jenkinsci/kubernetes-plugin

  • Kubernetes-cli plugin: It provides a withconfig() for pipeline support, which uses the configure credentials to connect to our cluster and perform kubectl commands. However when running the pipeline, the kubeconfig wasn’t  recognized for some reason, and kept giving the error ‘file not found’.

Refer: https://github.com/jenkinsci/kubernetes-cli-plugin/blob/master/README.md

Hence, we installed kubectl on the Jenkins host, configured the cluster manually, and ran shell commands from the Jenkins pipeline, where Jenkins was recognized as an anonymous user and was only granted get access but couldn’t create pods/deployments.

Here are some common problems faced during this process and the troubleshooting procedure.

  • Configuring Jenkins to use local minikube cluster:
    We had trouble using both the plugins to properly configure Jenkins to create deployments as required. Using shell commands to run kubectl was also not successful since Jenkins was recognized as an anonymous user, and authorization prevented anonymous users from creating deployments.
  • Permission for /var/run/docker.sock is reset to root after every restart, so make sure to change it to allow Jenkins to continue to use docker commands: choose Jenkins /var/run/docker.sock
  • Installing Minikube:
    i) Started minikube cluster using hyperv as the driver and created a virtual    switch:
    minikube start –vm-driver=hyperv  –hyperv-virtual-switch=”Primary Virtual Switch”

    ii) Installation takes a lot of time, so we have to wait patiently, and eventually, the cluster will get configured and started. If there is a problem with apiserver, then stop the machine after SSHing into minikube vm:
    minikube ssh
    sudo poweroff

    iii) Then start minikube the same way.

Here are some suggested best practices

Maintaing git repo:
  • Branching must be used while updating the source code or adding a particular file to the repository. Suppose you want to add a readme, then create a new branch from master, create the readme and commit it and then merge the branch with the origin.
  • Similarly, for adding some test files/changing source code – create a new branch for testing/modification, update and commit the code and merge with the master when finished. The purpose of this is to allow easy roll back to the original master if you run into some errors when working with the new files and to prevent any conflict in code with the master.
  • Commit only after you have tested the code properly, never commit incomplete code.
  • Write good commit messages to keep track of the changes you have made.
Versioning:
  • Follow the versioning convention X.Y.Z where X is incremented for a new major update/feature, Y is incremented for minor updates/minor features and Z for minor patches/bug fixes
  • Avoid version lock that has too many dependencies in a single version. In such a scenario the package can only be updated after releasing new versions for every dependent package.
Docker repo:
  • Use unique tags for deployment/pushing images to the repository.
  • Always use stable tags for building images but never deploy/pull images using stable tags. Stable tags are the ones that do not roll over the updates to future versions but are bound to the current version (tag). 

Welcome to Cloud View!

The last two days of 2019 feel a bit like a waiting period – it’s a tad early to start celebrating, but it’s hard to plan anything until the new year celebrations are truly behind us and we are back at work. We think a good way to use this time would be to indulge in a bit of nostalgia and look back at how the technology landscape evolved in 2019.

Recap 2019

Kubernetes Podcast in 2019

If you deal with Kubernetes, then we are sure you follow the Kubernetes Podcast. Here is the roundup of the year’s best! Enjoy!

CRN’s 2019 Year in Review

LOVE ‘Top-10’ listicles? Here is a mammoth list of technology top 10s by CRN. From top 10 cybersecurity stories to the top 10 mobile apps of 2019 – it’s all here!

Top 10 Smarter with Gartner Articles for 2019

No report, survey, or listicle is complete without Gartner. Here are the 10 best “Smarter with Gartner articles from 2019”. 


Industry insights

What’s New with AWS

A quick video to recap the latest AWS updates and announcements (there are many!) and a web link, too, in case you want to explore categories in more detail.

IBM Z Open Editor Support for LSP

Any programmer can attest to the power – and the usefulness – of Language Server Protocol (LSP). Now its integration with IBM’s Z Open Editor opens a whole new level for coders across the globe.


From CloudIQ

Creating AWS Security Groups for Kubernetes

We are going to discuss creating security groups in AWS for Kubernetes. The goal is to set up a Kubernetes cluster on AWS EC2, having provisioned your virtual machines.

Deploy a Spring-Boot Application in Kubernetes Pod using Jenkins CI/CD Pipeline

Kubernetes has become the preferred platform of choice for container orchestration. Here is a walkthrough of how Jenkins CI/CD pipeline is used to deploy a spring boot application in K8s.

Signing off now and will see you all in the new year. Party hard and bring in 2020 in style.

Kubernetes has become the preferred platform of choice for container orchestration and deliver maximum operational efficiency. To understand how K8s works, one must understand its most basic execution unit – a pod.

Kubernetes doesn’t run containers directly, rather through a higher-level structure called a pod. A pod has an application’s container (or, in some cases, multiple containers), storage resources, a unique network IP, and options that govern how the container(s) should run.

Pods can hold multiple containers or just one. Every container in a pod will share the same resources and network. A pod is used as a replication unit in Kubernetes; hence, it is advisable to not add too many containers in one pod. Future scaling up would lead to unnecessary and expensive duplication.

To maximize the ease and speed of Kubernetes, DevOps teams like to add automation using Jenkins CI/CD pipelines. Not only does this make the entire process of building, testing, and deploying software go faster, but it also minimizes human error. Here is how Jenkins CI/CD pipeline is used to deploy a spring boot application in K8s.

TASK on Hand:

Create a Jenkins pipeline to dockerize a spring application, build docker image, push it to the dockerhub repo and then pull the image into an AKS cluster to run it in a pod.

Complete repository:

All the files required for this task are available in this repository:
https://github.com/saiachyuth5/simple-spring

Pre-Requisites:

A spring-boot application, dockerfile to containerize the application.

STEPS:

1. Install Jenkins :

2. Connect host docker daemon to Jenkins:

  • Run the command: chmod –R Jenkins:docker filename/foldername to allow Jenkins to access docker.
  • Go to manage Jenkins from browser >Configure System and scroll to the bottom
  • Click the dropdown ‘add cloud’ and add Docker. Add the docker host URI in the format tcp://hostip:4243
  • Click verify connection to check your connection. If everything was done right, the docker version is displayed.

3. Adding global credentials:

  • Go to Credentials on the Jenkins dashboard, click global credentials, and then Add credentials.
  • Select the kind as Microsoft Azure Service Principal and enter the required ids, similarly save the docker credentials under type username with password.

4. Create the Jenkinsfile :

  • Refer to the official Jenkins documentation for the pipeline syntax, usage of Jenkinsfile, and simple examples.
  • Below is the Jenkinsfile used for this task.

Jenkinsfile:

NOTE: While this example uses actual id to login to Azure, its recommended to use credentials to avoid using exact parameters.

pipeline {
environment {
registryCredential = "docker"
}
agent any
stages {
stage(‘Build’) {
    steps{
    script {
        sh 'mvn clean install'
    }
    }
}
stage(‘Load’) {
    steps{
    script {
        app = docker.build("cloud007/simple-spring")
    }
    }
}
    stage(‘Deploy’) {
    steps{
    script {
        docker.withRegistry( "https://registry.hub.docker.com", registryCredential ) {
        // dockerImage.push()
        app.push("latest")
        }
    }
    }
}
stage('Deploy to ACS'){
    steps{
        withCredentials([azureServicePrincipal('dbb6d63b-41ab-4e71-b9ed-32b3be06eeb8')]) {
        sh 'echo "logging in" '
        sh 'az login --service-principal -u **************************** -p ********************************* -t **********************************’
        sh 'az account set -s ****************************'
        sh 'az aks get-credentials --resource-group ilink --name    mycluster'
        sh 'kubectl apply -f sample.yaml'
    }
}
}
}
}

5. Create the Jenkins project:

  • Select New view>Pipeline and click ok.
  • Scroll to the bottom and select the definition as Pipeline from SCM.
  • Select the SCM as git and enter the git repo to be used, path to Jenkinsfile in Script path.
  • Click apply, and the Jenkins project has now been created.
  • Go to my views, select your view, and click on build to build your project.

6. Create and connect to Azure Kubernetes cluster:

  • Create an Azure Kubernetes cluster with 1-3 nodes and add its credentials to global credentials in Jenkins.
  • Install azure cli on the Jenkins host machine.
  • Use shell commands in the pipeline to log in, get-credentials, and then create a pod using the required yaml file.

YAML used:

apiVersion: apps/v1
kind: Deployment
metadata:
    name: spring-helloworld
spec:
    replicas: 1
    selector:
    matchLabels:
        app: spring-helloworld
    template:
    metadata:
        labels:
        app: spring-helloworld
    spec:
        containers:
        - name: spring-helloworld
        image: cloud007/simple-spring:latest
        imagePullPolicy: Always
        ports:
        - containerPort: 80

Here are some common problems faced during this process and the troubleshooting procedure.

  • Corrupt Jenkins exec file:
    Solved by doing an apt-purge and then apt-install Jenkins.
  • Using 32-bit VM:
    Kubectl is not supported on a 32-bit machine and hence make sure the system is 64-bit.
  • Installing azure cli manually makes it inaccessible for non-root users
    Manually installing azure cli placed it in the default directories, which were not accessible by non-root users and hence by Jenkins. So, it is recommended to install azure cli using apt.
  • Installing minikube using local cluster instead of AKS:
    Virtual box does not support nested VTx-Vtx virtualization and hence cannot run minikube. It is recommended to enable Hyperv and use HyperV as the driver to run minikube.
  • Naming the stages in the Jenkinsfile:
    Jenkins did not accept when named stage as ‘Build Docker Image’ or multiple words for some reason. Use a single word like ‘Build’, ‘Load’ etc…
  • Jenkins stopped building the project when the system ran out of memory:
    Make sure the host has at least 20 GB free in the hard disk before starting the project.
  • Jenkins couldn’t execute docker commands:
    Try the command usermog –a  –G docker Jenkins
  • Spring app not accessible from external IP:
    Created a new service with type loadbalancer, assigned it to the pod, and the application was accessible from this new external ip.

In an upcoming article we will show you how to deploy a pod containing three applications using Jenkins ci/cd pipeline and update them selectively.

Welcome to Cloud View!

This week we present to you some fresh articles on the evolution of some of the most dominant technologies for the coming decade.

Predictions

Blockchain in 2020

2019 has been a year of great highs and some lows for the blockchain technology. The next year is going to see some significant growth and consolidation of large blockchain protocols and digital assets.

Cybersecurity in 2020

Everyone agrees that security is going to be ALL IMPORTANT in the coming year. Here is an article that positions 2020 as the year of the breach. We guarantee you will bump security to the top of your list after reading this.


Industry insights

New features in Azure Monitor Metrics Explorer

A few months ago, Microsoft’s Azure clients gave some feedback regarding the use of metrics in Azure Portal. Now the Microsoft team comes back with some new features which address the main concerns of the community.


The Update Framework (TUF)

The ninth to join the CNCF’s list of mature technologies – The Update Framework (TUF) is an open-source technology that secures software update systems.


From CloudIQ

Kubernetes Deployment Controller – An Inside Look

Kubernetes Deployment Controller helps monitor and manage the upgrade, downgrade, and scaling of services without any disruption or downtime. Here’s a detailed look at the inner workings of Kubernetes Deployment Controller.

Implementing Azure AD Pod Identity in AKS Cluster

Cloud-based identity and access management service becomes a necessity for connecting pods in AKS cluster to access other Azure cloud resources and services. Here is a detailed look at how Azure AD Pod Identity helps.

Container and container orchestration have become the default system for any DevOps team that wants to scale on-demand, reduce costs, and deliver faster. And to get the best out of container technology, Kubernetes is the way to go. A recommended Kubernetes practice is to manage pods through a Deployment; this way, they can be monitored and restarted if a failure occurs.

A deployment is created by using a Kubernetes Deployment Controller object. The application (in a container) is deployed to Kubernetes by declaratively passing a desired state to the Kubernetes Deployment Controller. A K8s deployment controller object is utilized for monitoring, management of upgrade, downgrade, and scaling of services (e.g., pods) without any disruption or downtime. This is made possible because the deployment controller is the single source of truth for the sizes of new and old replica sets. It maintains multiple replica sets, and when you describe a desired state, the DC changes the actual state at the correct pace.

Here’s a detailed look at the inner workings of Kubernetes Deployment Controller

K8s deployment controller is responsible for the following functions

– Managing a set of pods in the form of Replica Sets & Hash-based labels
– Rolling out new versions of application through new Replica Sets
– Rolling back to old versions of application through old Replica Sets
– Pause & Resume Rollout/Rollback functions
– Scale-Up/Down functions

“The Kubernetes controller manager is a daemon that embeds the core control loops shipped with Kubernetes. In applications of robotics and automation, a control loop is a non-terminating loop that regulates the state of the system. In Kubernetes, a controller is a control loop that watches the shared state of the cluster through the apiserver and makes changes attempting to move the current state towards the desired state. Examples of controllers that ship with Kubernetes today are the replication controller, endpoints controller, namespace controller, and serviceaccounts controller.”

func NewControllerInitializers(loopMode ControllerLoopMode) map[string]InitFunc {
        controllers := map[string]InitFunc{}
        controllers["endpoint"] = startEndpointController
        controllers["endpointslice"] = startEndpointSliceController
        controllers["replicationcontroller"] = startReplicationController
        controllers["podgc"] = startPodGCController
        controllers["resourcequota"] = startResourceQuotaController
        controllers["namespace"] = startNamespaceController
        controllers["serviceaccount"] = startServiceAccountController
        controllers["garbagecollector"] = startGarbageCollectorController
        controllers["daemonset"] = startDaemonSetController
        controllers["job"] = startJobController
        controllers["deployment"] = startDeploymentController
        controllers["replicaset"] = startReplicaSetController
        controllers["horizontalpodautoscaling"] = startHPAController
        controllers["disruption"] = startDisruptionController
        controllers["statefulset"] = startStatefulSetController
        controllers["cronjob"] = startCronJobController
        controllers["csrsigning"] = startCSRSigningController
        controllers["csrapproving"] = startCSRApprovingController
        controllers["csrcleaner"] = startCSRCleanerController
        controllers["ttl"] = startTTLController
        controllers["bootstrapsigner"] = startBootstrapSignerController
        controllers["tokencleaner"] = startTokenCleanerController
        controllers["nodeipam"] = startNodeIpamController
        controllers["nodelifecycle"] = startNodeLifecycleController
 	if loopMode == IncludeCloudLoops {
                controllers["service"] = startServiceController
                controllers["route"] = startRouteController
                controllers["cloud-node-lifecycle"] = startCloudNodeLifecycleController
                // TODO: volume controller into the IncludeCloudLoops only set.
        }
        controllers["persistentvolume-binder"] = startPersistentVolumeBinderController
        controllers["attachdetach"] = startAttachDetachController
        controllers["persistentvolume-expander"] = startVolumeExpandController
        controllers["clusterrole-aggregation"] = startClusterRoleAggregrationController
        controllers["pvc-protection"] = startPVCProtectionController
        controllers["pv-protection"] = startPVProtectionController
        controllers["ttl-after-finished"] = startTTLAfterFinishedController
        controllers["root-ca-cert-publisher"] = startRootCACertPublisher

        return controllers
}

Let’s look at inside workings of “Deployment” Controller. It watches for following object updates.

func startDeploymentController(ctx ControllerContext) (http.Handler, bool, error) {
        if !ctx.AvailableResources[schema.GroupVersionResource{Group: "apps", Version: "v1", Resource: "deployments"}] {
                return nil, false, nil
        }
        dc, err := deployment.NewDeploymentController(
                ctx.InformerFactory.Apps().V1().Deployments(),
                ctx.InformerFactory.Apps().V1().ReplicaSets(),
                ctx.InformerFactory.Core().V1().Pods(),
                ctx.ClientBuilder.ClientOrDie("deployment-controller"),
        )
        if err != nil {
                return nil, true, fmt.Errorf("error creating Deployment controller: %v", err)
        }
        go dc.Run(int(ctx.ComponentConfig.DeploymentController.ConcurrentDeploymentSyncs), ctx.Stop)
        return nil, true, nil
}

The “Deployment Controller” initializes the following Event handlers.

dInformer.Informer().AddEventHandler(cache.ResourceEventHandlerFuncs{
                AddFunc:    dc.addDeployment,
                UpdateFunc: dc.updateDeployment,
                // This will enter the sync loop and no-op, because the deployment has been deleted from the store.
                DeleteFunc: dc.deleteDeployment,
        })
        rsInformer.Informer().AddEventHandler(cache.ResourceEventHandlerFuncs{
                AddFunc:    dc.addReplicaSet,
                UpdateFunc: dc.updateReplicaSet,
                DeleteFunc: dc.deleteReplicaSet,
        })
        podInformer.Informer().AddEventHandler(cache.ResourceEventHandlerFuncs{
                DeleteFunc: dc.deletePod,
        })

Since Kubernetes uses asynchronous programming, the events are processed through work queues and workers.

func (dc *DeploymentController) addDeployment(obj interface{}) {
        d := obj.(*apps.Deployment)
        klog.V(4).Infof("Adding deployment %s", d.Name)
        dc.enqueueDeployment(d)
}

The items from the queue are handled by “syncDeployment” handler. Some of the functions done by the handler are shown below.

// List ReplicaSets owned by this Deployment, while reconciling ControllerRef
        // through adoption/orphaning.
        rsList, err := dc.getReplicaSetsForDeployment(d)
	
	// List all Pods owned by this Deployment, grouped by their ReplicaSet.
        // Current uses of the podMap are:
        //
        // * check if a Pod is labeled correctly with the pod-template-hash label.
        // * check that no old Pods are running in the middle of Recreate Deployments.
        podMap, err := dc.getPodMapForDeployment(d, rsList)

	// Update deployment conditions with an Unknown condition when pausing/resuming
        // a deployment. In this way, we can be sure that we won't timeout when a user
        // resumes a Deployment with a set progressDeadlineSeconds.
        if err = dc.checkPausedConditions(d); err != nil {
                return err
        }

	// rollback is not re-entrant in case the underlying replica sets are updated with a new
        // revision so we should ensure that we won't proceed to update replica sets until we
        // make sure that the deployment has cleaned up its rollback spec in subsequent enqueues.
        if getRollbackTo(d) != nil {
                return dc.rollback(d, rsList)
        }

        scalingEvent, err := dc.isScalingEvent(d, rsList)
        if err != nil {
                return err
        }
        if scalingEvent {
                return dc.sync(d, rsList)
        }

        switch d.Spec.Strategy.Type {
        case apps.RecreateDeploymentStrategyType:
                return dc.rolloutRecreate(d, rsList, podMap)
        case apps.RollingUpdateDeploymentStrategyType:
                return dc.rolloutRolling(d, rsList)
        }

Sync is responsible for reconciling deployments on scaling events or when they are paused.

func (dc *DeploymentController) sync(d *apps.Deployment, rsList []*apps.ReplicaSet) error {
        newRS, oldRSs, err := dc.getAllReplicaSetsAndSyncRevision(d, rsList, false)
        if err != nil {
                return err
        }
        if err := dc.scale(d, newRS, oldRSs); err != nil {
                // If we get an error while trying to scale, the deployment will be requeued
                // so we can abort this resync
                return err
        }

        // Clean up the deployment when it's paused and no rollback is in flight.
        if d.Spec.Paused && getRollbackTo(d) == nil {
                if err := dc.cleanupDeployment(oldRSs, d); err != nil {
                        return err
                }
        }

        allRSs := append(oldRSs, newRS)
        return dc.syncDeploymentStatus(allRSs, newRS, d)
}


// scale scales proportionally in order to mitigate risk. Otherwise, scaling up can increase the size
// of the new replica set and scaling down can decrease the sizes of the old ones, both of which would
// have the effect of hastening the rollout progress, which could produce a higher proportion of unavailable
// replicas in the event of a problem with the rolled out a template. Should run only on scaling events or
// when a deployment is paused and not during the normal rollout process.

func (dc *DeploymentController) scale(deployment *apps.Deployment, newRS 
*apps.ReplicaSet, oldRSs []*apps.ReplicaSet) error {

 	// If there is only one active replica set then we should scale that up to the full count of the
        // deployment. If there is no active replica set, then we should scale up the newest replica set.
        if activeOrLatest := deploymentutil.FindActiveOrLatest(newRS, oldRSs); activeOrLatest != nil {


	// If the new replica set is saturated, old replica sets should be fully scaled down.
        // This case handles replica set adoption during a saturated new replica set.
        if deploymentutil.IsSaturated(deployment, newRS) {

 // There are old replica sets with pods, and the new replica set is not saturated. 
        // We need to proportionally scale all replica sets (new and old) in case of a
        // rolling deployment.
        if deploymentutil.IsRollingUpdate(deployment) {

		// Number of additional replicas that can be either added or removed from the total
                // replicas count. These replicas should be distributed proportionally to the active
                // replica sets.
                deploymentReplicasToAdd := allowedSize - allRSsReplicas

                // The additional replicas should be distributed proportionally amongst the active
                // replica sets from the larger to the smaller in size replica set. Scaling direction
                // drives what happens in case we are trying to scale replica sets of the same size.
                // In such a case when scaling up, we should scale up newer replica sets first, and
                // when scaling down, we should scale down older replica sets first.

We hope this article helped you understand the inner workings of Kubernetes deployment controller. If you would like to learn more about Kubernetes and get certified, join our 2-day Kubernetes workshop.

Welcome to Cloud View!

The last couple of weeks we have been curating predictions for the coming year (and decade) from well regarded sources. Now it’s time to drill down deeper into specific areas and find out what experts in the field see in store for the future.

Predictions

Cybersecurity: Mitigating cyber-attacks and risks

Forbes has put out a really exhaustive list of predictions (141 to be exact!) in the cyber security realm. These are all from key players and professionals in the digital arena – CIOs, CEOs, CFOs and security heads from across the digital spectrum weigh in with what they think is crucial to mitigate cyber attacks and risks in the coming few years.

Future of DevOps

DevOps is all about bringing the power of collaboration to executing business ideas; turning organizational visions into applications that drive growth and profits. So what does the future hold for the DevOps community?


Industry Speak

AT&T integrating 5G with Microsoft cloud

5G has been in news for all the wrong reasons, but finally we see some interesting news emerging from the industry. A strategic partnership between Microsoft and AT&T announces that AT&T’s 5G core will run on Azure!


Kubernetes for exponential growth

Containers and container orchestration with Kubernetes are vital for any tech-based business looking to deliver more features – faster and more affordably. Here is a look at how AlphaSense, one of the top AI start-ups leveraged Kubernetes to accelerate growth.


From CloudIQ

Optimizing Azure Cosmos DB Performance

Azure Cosmos DB allows Azure platform users to elastically and independently scale throughput and storage across any number of Azure regions worldwide. Here is an article on how to optimize Cosmos DB performance.

How to Debug and Troubleshoot Common Problems in Kubernetes Deployments

Kubernetes deployment issues are not always easy to troubleshoot. In some cases, the errors can be resolved easily, and, in some cases, detecting errors requires us to dig deeper and run various commands to identify and resolve the issues. Here is a guided tutorial to debug applications that are deployed into Kubernetes.

Welcome to Cloud View!

With the new year 2020 coming closer, the industry is firmly looking towards the future. Cloud news is full of predictions for the year ahead and here’s a quick selection we picked for this week’s reading.

Predictions

Forrester’s cloud computing predictions for 2020

Forrester has an excellent track record of predicting the right cloud trends. And that makes their 2020 cloud computing predictions a MUST-READ. Here is a breakdown from TechRepublic.

Gartner’s top strategic predictions for 2020 and beyond

Last week we put the spotlight on Granter’s strategic trends, this week we delve further into these to understand how they would affect the people and their lives and work. Unsurprisingly, AI takes center-stage again.


Industry Insights

Telcos embrace containers

Gartner predicts that over 75% of global companies will run containerized applications by 2022. Kubernetes is the leading container orchestration platform for managing these containers. Here is a look at how Telcos are planning to use it to deploy cloud-native 5G networks.


AWS IoT Day – Eight Powerful New Features

AWS regularly puts out bundled themed announcements, which make it easy for us to find relevant information in one place. Here is the one related to AWS IoT Day. Check out 8 powerful AWS features, from secret tunneling to Alexa voice service integration and more.


Interesting announcements from KubeCon

Over 100 announcements were made at KubeCon, here’s a quick read of the 10 most important ones.


This week at CloudIQ

Kubernetes on Azure: A 2-day workshop for AKS developers

Container technology has revolutionized the DevOps landscape and offers organizations the chance to develop and test applications faster and more cost-effectively. CloudIQ’s 2-day hands-on workshop is designed to give DevOps team members the opportunity to skill-up and learn Kubernetes design, deployment, and management.

Configuring Palo Alto Networks Next-Generation Firewall (NGFW) – A Detailed Guide

Today organizations require an enterprise cyber-security platform, which provides network security, cloud security, endpoint protection, & various related cloud-delivered security services. Palo Alto Networks Next-Generation Firewall (NGFW) fits the bill and here is a detailed guide on configuring it.

End-to-end front-end testing has always been a bit of a pain for developers. Testing is one of the critical final steps of any development project, however web testing has tested the patience of all developers at some time or another. The modern web testing ecosystem comes with its own set of challenges – from data security to additional time and expense to managing the dynamic behavior of the contemporary development frameworks. Hence, the need to bring automation to the testing process!

Benefits of Automation Testing
  • Automation increases the speed of test execution
  • Automation helps increase Test Coverage
  • One can do automation testing at the time of regression work
  • Automation testing works when GUI is the same, but you will have a lot of functional changes

When to use Automation Testing?

Here are some scenarios where Automation testing is highly recommended

  • Requirements do not change frequently
  • Access to the application for load and performance with many virtual users
  • Steady software with respect to manual testing
  • Obtainability of time
  • Huge and business-critical projects
  • Projects that need to test the same areas often
Automation testing step by step

There are lots of helpful tools to write automation scripts, however, before using those tools it’s important to identify the process for test automation.

  • Identify areas within the software to automate
  • Choose the appropriate tool for test automation
  • Write test scripts
  • Develop test suits
  • Execute test scripts
  • Build result reports
  • Find possible bugs or performance issue
List of automation tools:

Automation of testing frameworks helps us to improve the quality, speed, and accuracy of the testing processes. Here is a list of automation tools,

  • Cypress
  • Selenium
  • Protractor
  • Appium(Mobile)
Why choose Cypress:

Cypress solves many of the main testing bottlenecks developers face regularly. It is a JavaScript-based end-to-end testing framework that doesn’t use Selenium (most widely used for testing) at all. It is built on top of Mocha and Mocha’s features is the javascript test framework running on the browser, which makes asynchronous testing simple. Cypress automatically waits for loading DOM element, elements to be visible, AJAX calls to be finished, etc. Hence, we don’t need to use implicit and explicit waits.

Another advantage Cypress offers to developers is that it runs directly in the browser with no network communication. The architecture makes testing and development happen simultaneously. It allows developers access to tools, and they can make changes and see them reflected in real-time. Naturally, this lends more precision and speed to the whole process.

Features of Cypress:
  • Time travel: Cypress takes snapshots as your test runs.
  • Debuggability: Cypress can guess why a test case has failed.
  • Automatic waiting: There is no need to use wait or sleep because it automatically waits for your commands.
  • Spies, stubs, and clocks: verify and control the behavior of functions, server response, and timer.
  • Screenshot and video: Cypress testing automatically takes screenshots when your test case fails and makes a video of the complete result when it is run from the CLI.
Features of Mocha:

Mocha provides the below benefits,

  • Browser support
  • Async & promises support
  • Test coverage reporting
Advantage of Cypress:
  • Open-source
  • It has Promise Support
  • Java script testing framework
  • Easy and reliable testing
  • Fast, free and open-source
  • Easy to control our response, headers, and status.
  • Helps you in finding the locator
Installing Cypress:

Installing Cypress is an easy task compared to a Selenium installation. There are two commands used to install Cypress on machines. These are,

  1. npm init
  2. npm install Cypress

The first command is used to create a “package.json” file, and the second command is used to install all Cypress dependencies.

Project Folder Structure Details:

Project folder structure details as below,

*node_modules folder – It is the directory for build tools.

*package.json file – It is the file in the app root, which defines where libraries will be installed into node_modules when you run “npm install”.

*cypress folder – It contains folder like fixtures, integration, plugins, screenshot, support, and video. These folder features are below,

a. Fixture – This folder is used as external pieces of static data and can be used for your test.

b. Integration – This folder is used to write the testcase of your app.

 c. Screenshot – This folder is used to store screenshots of your test.

 d. Video – It is used to store videos in your test.

 e. Support – This folder is used to write the common commands file.

Write your sample program in Cypress:

Step 1: Open your visual studio code in your machine

Step 2: Create a new Cypress project folder and name it as “cypresse2e”

Step 3: Open the command line and go to the above-created project path.

Step 4: Type first command under “Installing cypress” heading then wait for it to create package.json file.

Step 5: After that, type the second command

Step 6: The above task will finish within 2-3 minutes after creating the Cypress and node_modules folder inside the “cypresse2e” folder. This folder will also contain a “json” file.

Step 7: Click the Cypress folder under “cypresse2e” in vs code.

Step 8: Automation page details are as below,

We will use the CloudIQ home page link for this automation

Step 9: Create the “cypressAutomation.spec.ts” file under integration folder and write the program as seen below in the screenshot,

Program Explanation:

Here is what the test script given above does.

  • Navigate to “cloudiqtech” site.
  • Wait for 10 seconds for the page to load
  • Next click on “AWS” ref link
  • Then navigate to “AWS” page
  • Finally, validate the current page as the AWS page.

Step 10: Open the command terminal then go to your Cypress project path then run the below-mentioned command,

Step 11: After waiting for 1-2 minutes, it will open the Cypress Terminal app, as shown below in the screenshot. It contains all the tests – like the ones you wrote in your automation test and default tests.

Step 12: Click your “cypressAutomation.spec.js” file, this automatically opens the default chrome browser to run your test and makes a test coverage report in your browser like below,

Test Result:

*Three tests are passed successfully.

*No tests failed here.

*In total, these three tests ran within 30.44 seconds.

*Screenshots were automatically taken during your tests. If you hover above the testcase in the test, it will display the screenshot image for every separate testcase.

Welcome to Cloud View!

The cloud computing landscape is evolving, innovating, and expanding almost at the speed of thought! Every week there are hundreds of announcements, insights, opinions, and studies discussing new trends, ideas, and technological breakthroughs. To help you get the most relevant information, we have put together a weekly curated list of must-read articles under Cloud View.

Predictions

Cloud computing in 2020

As 2019 winds down, our eyes turn to the next year. Let’s find out what the pundits are predicting for cloud computing in 2020.
Hybrid cloud is passé as onmi-cloud becomes the preferred enterprise approach; Kubernetes is all set to become the dominant talking point in tech conversations in 2020, and AI will become omnipresent.

How will upcoming technology affect humans – at work or at home?

How can IT leaders invest in today’s technology for a future payoff? Here are the top 10 strategic tech trends by Gartner – a look into the future. A must-read before planning for the coming year.

AI and Robotics

No talk of the future is complete without AI and robotics! Keith Shaw, editor-in-chief of Robotics Business Review, shares some insights on where robotics is heading.

Industry Insights

Microsoft and Google Cloud’s battle for the enterprise

Google Cloud is aggressively trying to elbow into the cloud market, leading to a battle royale. All the cloud giants have deep pockets and are not afraid of investing BIG, which is great for innovation and enterprise clients. Check out the whole article for a more in-depth look at how the race for cloud dominance is playing out.  

Confidential Computing

Microsoft brings confidential computing capability to Kubernetes workloads – an additional layer of security to keep business data safe.

This week at CloudIQ

 

CloudIQ Technologies is now a Kubernetes Certified Service Provider (KCSP)

Cloud Native Computing Foundation (CNCF) recognizes CloudIQ as one of the few (122 as on Nov 19, 2019) service providers worldwide to receive KCSP certification. As a KCSP, CloudIQ will be able to access collaborative group support to help its clients develop and deploy cloud native applications quickly and efficiently. The CNCF partnership program also puts CloudIQ in touch with organizations looking to design, adopt and implement cloud native solutions.

 

Understanding Kubernetes Concepts – A QuickStart Guide 

Container technology made software development more agile, however, containers need to be tracked, monitored, and managed, which is where container orchestration and Kubernetes come in. Here is a quick start guide to understanding Kubernetes concepts.

At a Glance

Seattle, WA, November 21, 2019 – CloudIQ Technologies, a fast-growing premier cloud consulting and solutions provider, announced today its status as a Kubernetes Certified Service Provider (KCSP).

Cloud Native Computing Foundation(CNCF) is a non-profit member of the Linux Foundation that promotes cloud-native computing and the certification body for Kubernetes. CNCF recognizes CloudIQ as one of the few (121 as on date) service providers worldwide to receive KCSP certification.

“We are so thrilled to welcome CloudIQ Technologies into the CNCF family,” said Dan Kohn, Executive Director of the Cloud Native Computing Foundation. “As part of our select group of Kubernetes Certified Services Providers (KCSPs), CloudIQ will be an integral part of our Kubernetes platform outreach and will be available to help organizations successfully adopt Kubernetes.”

As a KCSP, CloudIQ will be able to access collaborative group support to help its clients develop and deploy cloud native applications quickly and efficiently. The CNCF partnership program also puts CloudIQ in touch with organizations looking to design, adopt and implement cloud native solutions.

CloudIQ’s technology experts, who have more than a decade of varied technical and business experience, deliver the full range of cloud-native / open source solutions to its clients through its team of Azure & AWS certified architects, Certified Kubernetes Administrators (CKA), Certified Kubernetes Developers, and DevSecOps professionals.

“We have been extremely happy to work closely with the CNCF community to push the boundaries of the Kubernetes platform further and pass on the benefits of cloud-native technologies to our clients,” said Mr. Prem Kandalu, CEO. “As a part of the Cloud Native Computing Foundation, we will continue to develop new capabilities and strengthen our cloud strategy and business. We hope our contribution to the pooled expertise at CNCF delivers greater speed, scale, economic advantages to developers across the board and leads to the growth of the entire K8s community.”

About CloudIQ Tech:

CloudIQ is a leading cloud consulting and solutions firm with deep industry expertise in building cloud-native solutions that help customers realize the cost, scale, and security benefits of the cloud. As strategic advisors to several Fortune 500 companies, CloudIQ provides operational strategies, monitoring & assistance with command centers, and application transformation guidance for comprehensive cloud migrations.

For more information visit our site https://www.cloudiqtech.com
Or you are welcome to contact us by calling us at +1 (206) 203-4151
Or mailing at [email protected]

Cosmos DB is Microsoft Azure’s hugely successful tool to help their clients manage data on a global scale. This multi-model database service allows Azure platform users to elastically and independently scale throughput and storage across any number of Azure regions worldwide.

As Cosmos DB supports multiple data models, you can take advantage of fast, single-digit-millisecond data access using any of your favorite APIs, including SQL, MongoDB, Cassandra, Tables, or Gremlin. Being a NoSQL database, anyone with experience in MongoDB can easily work with Cosmos DB. Meanwhile, by supporting SQL APIs, it makes it easy to interact with SQL knowledge.

Why Cosmos DB?

For organizations looking to build a flexible and scalable database that is globally distributed, Cosmos DB is especially useful as it

  • provides a ready-to-use, extremely dynamic database service
  • guarantees low latency of less than 10 milliseconds when reading data and less than 15 milliseconds when writing data.
  • offers customers a faster, completely seamless experience.
  • offers 99.99% availability

Here are some tried and tested tips from our senior Azure expert on how to get the most out of Cosmos DB.

Data Modeling

Cosmos DB is great because it lets you model semi/unstructured aggregates and dynamic entities. This means it’s very easy to model ever-changing entities, entities that don’t all share the same attributes and hierarchical aggregates. To model for Cosmos, you need to think in terms of hierarchy and aggregates instead of entities and relations. NoSQL lets you, say, store a thing that has other things, which have things of their own. Give me the whole hierarchy of things back. So, you don’t have a person, rental addresses, and a relation between them. Instead, you have rental records, which aggregate for each person what rental addresses they’ve had.

The following NO SQL rules will perfectly match cosmos DB too.

  • Should be used as a complement to an existing or additional database.
  • Deal with PACELC theorem, an extension of CAP theorem.
  • A data modeler should think in terms of queries instead of in terms of storage
Connection Types

Cosmos DB can be connected to the application by 2 modes.

  • Gateway mode
  • Direct mode

The gateway mode is the default mode in Microsoft.Azure.DocumentDB SDK. It uses HTTPS port with a single endpoint. While the direct mode is the default for .NET V3 SDK. This uses TCP and HTTPS for connectivity.

The gateway mode is better while your application runs within a corporate network with strict network rules because it has a single endpoint that can be configured to the firewall for security. Meanwhile, the gateway mode performance will be low when compared to the direct mode.

There is also an option to connect through a RESTful programming model provided by the SDK. All the CRUD can be done through REST calls. This method is recommended if you need a client App to do database access directly instead of providing API. Thus, the overhead of providing an API wrapper to consume the Cosmos DB can be eradicated, and a performance payoff can be prevented.

The recommended mode is always the direct mode in most of the scenarios, which provides better performance.

I am taking the popular volcano data for comparing the response time between the SDK and RESTful model.

Query Executed in both the versions.

SELECT * FROM c where c.Status="Holocene"
Response details of the SDK

The query was responded with the data in 3710 ms.

Response details of the RESTful model

The query was responded with data in 5810 ms.

If we developed an API with this mode, then the response time taken by our API, too needs to be considered. So, using RESTful model in API will be a trade-off with the performance. Use this mode to query from the client directly.

Partitioning the DB

The logical partition is the primary key to provide performance to the cosmos transactions. For example, if you have a database with a list of student data of around 1500 from a school. Now, a simple search for a student named “Peter” will lead to a search through all the 1500 data entries, which consumes high throughput to get the result. Now split the data logically by the “Grade” they belong to. Now, querying like a student named “Peter” from “Grade 5” will lead the system to search only 30 or 40 students from the total 1500 saving the throughput consumption and elevating the performance as compared to the earlier approach.

The common phenomenon will be a. Any key such as city, state, country kind of properties can be used as Partition Key.

a. Any key such as city, state, country kind of properties can be used as Partition Key.
 b. No partition is required up to 10 GB.
 c. The query must be provided with the partition key to be searched.
 d. The property selected as partition key must be in all the documents in the container.

I am taking the popular Volcano data for testing the performance with and without Partition Key.

1. I initially created a collection without a partition key. The performance for the given query is

SELECT * FROM c where c.Status="Holocene"
Resultset
MetricValue
Partition key range id0
Retrieved document count200
Retrieved document size (in bytes)100769
Output document count200
Output document size (in bytes)101069
Index hit document count200
Index lookup time (ms)0.21
Document load time (ms)1.29
Query engine execution time (ms)0.33
System function execution time (ms)0
User defined function execution time (ms)0
Document write time (ms)0.52

2. Then I recreated the same collection with “/country” as a partition. Now the same query with Japan as partition value results with the given values.

Resultset
MetricValue
Partition key range id0
Retrieved document count16
Retrieved document size (in bytes)7887
Output document count16
Output document size (in bytes)7952
Index hit document count16
Index lookup time (ms)0.23
Document load time (ms)0.17
Query engine execution time (ms)0.06
System function execution time (ms)0
User defined function execution time (ms)0
Document write time (ms)0.01
Tune the Index

Indexing is always a top priority item in the checklist to tune performance. Indexing is an internal job that keeps track of the metadata about the data, which helps in finding the result set data for a query. By default, all the properties of a Cosmos container will be indexed. But it is not necessary as it is a useless overhead to the DB, and also keeping track of a lot of data consumes enough RUs, which is not cost-effective. The better approach is to exclude all the paths from indexing and add only age paths that are used for querying in the application. 

Indexing ModeWith Default IndexingWith Custom Indexing
RU’s Consumed3146.1519.83
Output Doc Count100100
Doc load time (In ms)646.512.32
Query engine execution time (In ms)434.264.96
System function execution time (In ms)57.032.41
Paging

The execution of the query, by default, will return 100 documents. We can increase the number of documents by providing “maxItemCount” value. The maximum size will be 1000 documents. But it is not meaningful to take 1000 documents at a time from the DB except in some scenarios. To improve the performance and to show a crisp result set to the user, always reduce the “maxItemCount”. Unlike SQL databases, pagination is the default behavior in Cosmos. So, even though you are going to provide maximum count as 1000 and the result for the query is going to be more than that, then the Cosmos is going to return a token named “Continuation Token”. This token consists of a unique value that points to the query we did and the page number. If the total result for the query is provided by the Cosmos, then we can do the logic of pagination at the front end. Thus, by reducing the number of documents per response, we can save the throughput consumption, network data transaction, and increase performance.

Throughput Management

RU ’s or Request Units is the common term we always come across when using Cosmos DB. When you read a document from a container or write a document to it, then you are trading an RU with the Cosmos for your operation. It is like currency in our common world. Without money, you can’t buy anything, and without RU, you can’t query anything. You can buy only the items that cost equivalent to or less than the money you have in hand. Similarly, you can query the data that equals the RU’s you have.

If you have large data and if the query needs to traverse deep into the collection, then you need enough RU’s. Adding a property in the index will consume some RU’s. So, if all the properties are indexed, then your RU’s pay off will be high, and you will lack RU’s for querying. So, always add in the index only the properties that are needed while querying.

Index properly, save the RU’s, and utilize it during querying. For example, if you have 100K documents in the container. The Cosmos DB is consuming 1000 RU’s to do a query operation up to 50K documents with the default indexing, then our query will not reach the rest of the 50K documents, and we will never receive them in our result set. If appropriately indexed, the same query will consume only 400 RU’s to penetrate all the 100K documents.

Startup latency

The very first query will always be a bit late because of the time it takes to awaken the connection. To overcome this latency, it is best practice to call “OpenAsync()” in SDK 2 in the beginning while creating the connection.

await client.OpenAsync();
Singleton Connection

The best approach is to connect the DB and keep the connection alive for all the instances of the application. Also, polling the DB within a period will keep the connection alive. This reduces the DB connectivity latency.

Regions

Make sure the Cosmos and the applications are grouped within the same Azure Region. This reduces the latency a lot. The lowest possible latency is achieved by ensuring the calling application is located within the same Azure region as the provisioned Azure Cosmos DB endpoint.

Programming Best Practices
  • Always use the latest SDK version.
  • Use Streaming API (in SDK 3) that can receive and return data without serializing. Helpful when your API is just a relay and not doing any logical operations on the data.
  • Tune the queries.
  • Implement retry logic with reasonable waiting time to prevent throttle during a busy time.

By carefully analyzing all the above factors, we can improve the Cosmos DB query performance substantially.

Amazon Web Services (AWS) is a secure cloud services platform, offering compute power, database storage, content delivery, and other functionalities to help businesses scale and grow.

It gives organizations a secure and robust platform to develop their custom cloud-based solutions and has several unique features that make it one of the most reliable and flexible cloud platform such as

  • Mobile-friendly access through AWS Mobile Hub and AWS Mobile SDK
  • Fully managed purpose-built Databases
  • Serverless cloud functions
  • Range of storage options that are affordable and scalable.
  • Unbeatable security and compliance

Following are some core services offered by AWS:

AWS Core services
  1. An EC2 instance is a virtual server in Amazon’s Elastic Compute Cloud (EC2) for running applications on the AWS infrastructure.
  2. Amazon Elastic Block Store (EBS) is a cloud-based block storage system provided by AWS that is best used for storing persistent data.
  3. Amazon Virtual Private Cloud (Amazon VPC) enables us to launch AWS resources into a virtual network that we have defined. This virtual network closely resembles a traditional network that we would operate in our own data center, with the benefits of using the scalable infrastructure of AWS.
  4. Amazon S3 or Amazon Simple Storage Service is a service offered by Amazon Web Services that provides object storage through a web service interface. Amazon S3 uses the same scalable storage infrastructure that Amazon.com uses to run its global e-commerce network.
  5. AWS security groups (SGs) are associated with EC2 instances and provide security at the protocol and port access level. Each security group — working much the same way as a firewall — contains a set of rules that filter traffic coming into and out of an EC2 instance.

Let us look more deeply at one of AWS’s core services – AWS CloudFormation – that is key for managing workloads on AWS.

1.   CloudFormation

AWS CloudFormation is a service that helps us model and set up our Amazon Web Services resources so that we can spend less time managing those resources and more time focusing on our applications that run in AWS.  We create a template that describes all the AWS resources that we want (like Amazon EC2 instances or S3 buckets), and AWS CloudFormation takes care of provisioning and configuring those resources for us. We don’t need to individually create and configure AWS resources and figure out what’s dependent on what; AWS CloudFormation handles all of that.

A stack is a collection of AWS resources that you can manage as a single unit. In other words, we can create, update, or delete a collection of resources by creating, updating, or deleting stacks. All the resources in a stack are defined by the stack’s AWS CloudFormation template.

2.   CloudFormation template

CloudFormation templates can be written in either JSON or YAML.  The structure of the template in YAML is given below:

---
AWSTemplateFormatVersion: "version date"

Description:
  String
Metadata:
  template metadata
Parameters:
  set of parameters
Mappings:
  set of mappings
Conditions:
  set of conditions
Resources:
  set of resources
Outputs:
  set of outputs

In the above yaml file,

  1. AWSTemplateFormatVersion – The AWS CloudFormation template version that the template conforms to.
  2. Description – A text string that describes the template.
  3. Metadata – Objects that provide additional information about the template.
  4. Parameters – Values to pass to our template at runtime (when we create or update a stack). We can refer to parameters from the Resources and Outputs sections of the template.
  5. Mappings – A mapping of keys and associated values that we can use to specify conditional parameter values, like a lookup table. We can match a key to a corresponding value by using the Fn::FindInMap intrinsic function in the Resources and Outputs sections.
  6. Conditions – Conditions that control whether certain resources are created or whether certain resource properties are assigned a value during stack creation or update. For example, we can conditionally create a resource that depends on whether the stack is for a production or test environment.
  7. Resources – Specifies the stack resources and their properties, such as an Amazon Elastic Compute Cloud instance or an Amazon Simple Storage Service bucket.  We can refer to resources in the Resources and Outputs sections of the template.
  8. Outputs – Describes the values that are returned whenever we view our stack’s properties. For example, we can declare an output for an S3 bucket name and then call the AWS cloudformation describe-stacks AWS CLI command to view the name.

Resources is the only required section in the CloudFormation template.  All other sections are optional.

3.   CloudFormation template to create S3 bucket

S3template.yml

Resources:
  HelloBucket:
    Type: AWS::S3::Bucket

In AWS Console, go to CloudFormation and click on Create Stack

Upload the template file which we created.  This will get stored in an S3 location, as shown below.

Click next and give a stack name

Click Next and then “Create stack”.  After a few minutes, you can see that the stack creation is completed.

Clicking on the Resource tab, you can see that the S3 bucket has been created with name “s3-stack-hellobucket-buhpx7oucrgn”.  AWS has provided this same since we didn’t specify the BucketName property in YAML.

Note that deleting the stack will delete the S3 bucket which it had created.

4.   Intrinsic functions

AWS CloudFormation provides several built-in functions that help you manage your stacks.

In the below example, we create two resources – a Security Group and an EC2 Instance, which uses this Security Group.  We can refer to the Security Group resource using the !Ref function.

Ec2template.yml

Resources:
  Ec2Instance:
    Type: 'AWS::EC2::Instance'
    Properties:
      SecurityGroups:
        - !Ref InstanceSecurityGroup
      KeyName: mykey
      ImageId: ''
  InstanceSecurityGroup:
    Type: 'AWS::EC2::SecurityGroup'
    Properties:
      GroupDescription: Enable SSH access via port 22
      SecurityGroupIngress:
        - IpProtocol: tcp
          FromPort: '22'
          ToPort: '22'
          CidrIp: 0.0.0.0/0

Some other commonly used intrinsic functions are

  1. Fn::GetAtt – returns the value of an attribute from a resource in the template.
  2. Fn::Join – appends a set of values into a single value, separated by the specified delimiter. If a delimiter is an empty string, the set of values are concatenated with no delimiter.
  3. Fn::Sub – substitutes variables in an input string with values that you specify. In our templates, we can use this function to construct commands or outputs that include values that aren’t available until we create or update a stack.
5.   Parameters

Parameters enable us to input custom values to your template each time you create or update a stack.

TemplateWithParameters.yaml

Parameters: 
  InstanceTypeParameter: 
    Type: String
    Default: t2.micro
    AllowedValues: 
      - t2.micro
      - m1.small
      - m1.large
    Description: Enter t2.micro, m1.small, or m1.large. Default is t2.micro.
Resources:
  Ec2Instance:
    Type: AWS::EC2::Instance
    Properties:
      InstanceType:
        Ref: InstanceTypeParameter
      ImageId: ami-0ff8a91507f77f867
6.   Pseudo Parameters

Pseudo parameters are parameters that are predefined by AWS CloudFormation. We do not declare them in our template. Use them the same way as we would a parameter as the argument for the Ref function.

Commonly used pseudo parameters:

  1. AWS: Region – Returns a string representing the AWS Region in which the encompassing resource is being created, such as us-west-2
  2. AWS::StackName – Returns the name of the stack as specified during cloudformation create-stack, such as teststack
7.   Mappings

The optional Mappings section matches a key to a corresponding set of named values. For example, if you want to set values based on a region, we can create a mapping that uses the region name as a key and contains the values we want to specify for each specific region. We use the Fn::FindInMap intrinsic function to retrieve values in a map.

We cannot include parameters, pseudo parameters, or intrinsic functions in the Mappings section.

TemplateWithMappings.yaml

AWSTemplateFormatVersion: "2010-09-09"
Mappings: 
  RegionMap: 
    us-east-1:
      HVM64: ami-0ff8a91507f77f867
      HVMG2: ami-0a584ac55a7631c0c
    us-west-1:
      HVM64: ami-0bdb828fd58c52235
      HVMG2: ami-066ee5fd4a9ef77f1
    eu-west-1:
      HVM64: ami-047bb4163c506cd98
      HVMG2: ami-0a7c483d527806435
    ap-northeast-1:
      HVM64: ami-06cd52961ce9f0d85
HVMG2: ami-053cdd503598e4a9d
    ap-southeast-1:
      HVM64: ami-08569b978cc4dfa10
      HVMG2: ami-0be9df32ae9f92309
Resources: 
  myEC2Instance: 
    Type: "AWS::EC2::Instance"
    Properties: 
      ImageId: !FindInMap [RegionMap, !Ref "AWS::Region", HVM64]
      InstanceType: m1.small
8.   Outputs

The optional Outputs section declares output values that we can import into other stacks (to create cross-stack references), return in response (to describe stack calls), or view on the AWS CloudFormation console. For example, we can output the S3 bucket name for a stack to make the bucket easier to find.

In the below example, the output named StackVPC returns the ID of a VPC, and then exports the value for cross-stack referencing with the name VPCID appended to the stack’s name.

Outputs:
  StackVPC:
    Description: The ID of the VPC
    Value: !Ref MyVPC
    Export:
      Name: !Sub "${AWS::StackName}-VPCID"

As organizations start to create and maintain clusters in AKS (Azure Kubernetes Service), they also need to use cloud-based identity and access management service to access other Azure cloud resources and services. The Azure Active Directory (AAD) pod identity is a service that gives users this control by assigning identities to individual pods.  

Without these controls, accounts may get access to resources and services they don’t require. And it can also become hard for IT teams to track which set of credentials were used to make changes.

Azure AD Pod identity is just one small part of the container and Kubernetes management process and as you delve deeper, you will realize the true power that Kubernetes and Containers bring to your DevOps ecosystem.

Here is a more detailed look at how to use AAD pod identity for connecting pods in AKS cluster with Azure Key Vault.

Pod Identity

Integrate your key management system with Kubernetes using pod identity. Secrets, certificates, and keys in a key management system become a volume accessible to pods. The volume is mounted into the pod, and its data is available directly in the container file system for your application.

On an existing AKS cluster –

Deploy Key Vault FlexVolume to your AKS cluster with this command:

  • kubectl create -f https://raw.githubusercontent.com/Azure/kubernetes-keyvault-flexvol/master/deployment/kv-flexvol-installer.yaml
1. Create the Deployment

Run this command to create the aad-pod-identity deployment on an RBAC-enabled cluster:

  • kubectl apply -f https://raw.githubusercontent.com/Azure/aad-pod-identity/master/deploy/infra/deployment-rbac.yaml

Or run this command to deploy to a non-RBAC cluster:

  • kubectl apply -f https://raw.githubusercontent.com/Azure/aad-pod-identity/master/deploy/infra/deployment.yaml
2. Create an Azure Identity

Create azure managed identity

Command:- az identity create -g ResourceGroupNameOfAKsService -n aks-pod-identity(ManagedIdentity)

Output:-  

{
"clientId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx ",
"clientSecretUrl": "https://control-westus.identity.azure.net/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourcegroups/aks_dev_rg_wu/providers/Microsoft.ManagedIdentity/userAssignedIdentities/aks-pod-identity/credentials?tid=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx&oid=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxx&aid=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx ",
"id": "/subscriptions/xxxxxxxx-xxxx-XXXX-XXXX-XXXXXXXXXXXX/resourcegroups/aks_dev_rg_wu/providers/Microsoft.ManagedIdentity/userAssignedIdentities/aks-pod-identity",
"location": "westus",
"name": "aks-pod-identity",
"principalId": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
"resourceGroup": "au10515_aks_dev_rg_wu",
"tags": {},
"tenantId": XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXX ",
"type": "Microsoft.ManagedIdentity/userAssignedIdentities"
}

Assign Cluster SPN Role

Command for Getting AKSServicePrincipalID:- az aks show -g <resourcegroup> -n <name> –query servicePrincipalProfile.clientId -o tsv

Command:-az role assignment create –role “Managed Identity Operator” –assignee <AKSServicePrincipalId> –scope < ID of Managed identity>

Assign Azure Identity Roles

Command:- az role assignment create –role Reader –assignee <Principal ID of Managed identity> –scope <KeyVault Resource ID>

Set policy to access keys in your Key Vault

Command:- az keyvault set-policy -n dev-kv –key-permissions get –spn  <Client ID of Managed identity>

Set policy to access secrets in your Key Vault

Command:- az keyvault set-policy -n dev-kv –secret-permissions get –spn <Client ID of Managed identity>

Set policy to access certs in your Key Vault

Command:- az keyvault set-policy -n dev-kv –certificate-permissions get –spn <Client ID of Managed identity>

3. Install the Azure Identity

Save this Kubernetes manifest to a file named aadpodidentity.yaml:

apiVersion: "aadpodidentity.k8s.io/v1"
kind: AzureIdentity
metadata:
name: <a-idname>
spec:
type: 0
ResourceID: /subscriptions/<subid>/resourcegroups/<resourcegroup>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<name>
ClientID: <clientId>

Replace the placeholders with your user identity values. Set type: 0 for user-assigned MSI or type: 1 for Service Principal.

Finally, save your changes to the file, then create the AzureIdentity resource in your cluster:

kubectl apply -f aadpodidentity.yaml

4. Install the Azure Identity Binding

Save this Kubernetes manifest to a file named aadpodidentitybinding.yaml:

apiVersion: "aadpodidentity.k8s.io/v1"
kind: AzureIdentityBinding
metadata:
  name: demo1-azure-identity-binding
spec:
  AzureIdentity: <a-idname>
  Selector: <label value to match>

Replace the placeholders with your values. Ensure that the AzureIdentity name matches the one in aadpodidentity.yaml.

Finally, save your changes to the file, then create the AzureIdentityBinding resource in your cluster:

kubectl apply -f aadpodidentitybinding.yaml

Sample Nginx Deployment for accessing key vault secret using Pod Identity

Save this sample nginx pod manifest file named nginx-pod.yaml:

apiVersion: v1
kind: Pod
metadata:
  labels:
    app: nginx-flex-kv-podid
    aadpodidbinding: 
  name: nginx-flex-kv-podid
spec:
  containers:
  - name: nginx-flex-kv-podid
    image: nginx
    volumeMounts:
    - name: test
      mountPath: /kvmnt
      readOnly: true
  volumes:
  - name: test
    flexVolume:
      driver: "azure/kv"
      options:
        usepodidentity: "true"         # [OPTIONAL] if not provided, will default to "false"
        keyvaultname: ""               # the name of the KeyVault
        keyvaultobjectnames: ""        # list of KeyVault object names (semi-colon separated)
        keyvaultobjecttypes: secret    # list of KeyVault object types: secret, key or cert (semi-colon separated)
        keyvaultobjectversions: ""     # [OPTIONAL] list of KeyVault object versions (semi-colon separated), will get latest if empty
        resourcegroup: ""              # the resource group of the KeyVault
        subscriptionid: ""             # the subscription ID of the KeyVault
        tenantid: ""            # the tenant ID of the KeyVault
Azure AD Pod Identity points to remember when implementing in cluster
  • Azure AD Pod Identity is currently bound to the default namespace. Deploying an Azure Identity and it’s binding to other namespaces, will not work!
  • Pods from all namespaces can be executed in the context of an Azure Identity deployed to the default namespace (related to point 1)
  • Every Pod Developer can add the aadpodidbinding label to his/her pod and use your Azure Identity
  • Azure Identity Binding is not using default Kubernetes label selection mechanism

There is little doubt that data will guide the next generation of business strategy and will bring new efficiencies across industries. But for that to happen, organizations must be able to extract insights from their data.

Qubole is an ideal platform to activate end-to-end data processing in organizations. It combines all types of data – structured, unstructured, and legacy offline data – into a single data pipeline and turns it into rich insights by adding AI, ML, and deep analytics tools to the mix.

It scales seamlessly to accommodate more users and new data without adding administrative overheads and lowers cloud costs significantly. Simply put, Qubole is a platform that puts big data on the cloud to power business decisions based on real-time analytics.

At CloudIQ Technologies, our data experts have deployed Qubole’s cloud-native data systems for many of our clients, and the results have been outstanding. Here is an article from one of our data engineers that provides an overview of how to setup Qubole to use AWS environment and create and run spark clusters.

AWS Access Configuration:

In order for Qubole to create and run a cluster, we have to grant Qubole access to our AWS environment. We can grant access based on a key or a role. We will use role-based authentication.

Step 1: Login to Qubole

Step 2: Click on the menu at the top left corner and select “Account Settings” under the Control Panel.

Step 3: Scroll down to Access settings

Step 4: Switch Access mode to “IAM Role”

Step 5: Copy the Trusted Principal AWS Account ID and External ID

Step 6: Use the copied values to create a QuboleAccessRole in the AWS account (using the cloudformation template)

Step 7: Copy the Role ARN of the QuboleAccessRole and enter it in the Role ARN field

Step 8: Enter the S3 bucket location where the Qubole metadata will be stored in the “Default Location” field.

Step 9: Click Save

Spark Cluster
Create a cluster

The below steps will help create a new Spark cluster in Qubole.

Step 1: Click on the top-left dropdown menu and select “Cluster”

Step 2: Click on “+New” button

Step 3: Select “Spark” and click “Next”

Step 4: Provide a name for the cluster in the “Cluster Labels” field

Step 5: Select the version of Spark to run, Master Node Type, Worker Node Type, Minimum and Maximum nodes

Step 6: Select Region as us-west-2

Step 7: Select Availability Zone as us-west-2a

Step 8: Click “Next”

Step 9: In the Composition screen, you can select the type of nodes that will be spun up.

Step 10: In the Advanced Configuration screen, proceed to EC2 settings

Step 11: Enter “QuboleDualIAMRole” in the “Instance Profile” field

Step 12: Select “AppVPC” in VPC field

Step 13: Select “AppPrivateSNA” under Subnet field

Step 14: Enter the ip address of the Bastion node in the “Bastion Node” field

Step 15: Scroll to the bottom and enter “AppQuboleClusterSG” (security group for the cluster) in the “Persistent Security Group” field

Step 16: Click on “Create”

Run a cluster

To start a cluster, click on the dropdown menu on the top left corner and select cluster. Now click on “Start” button next to the cluster that needs to be started. A cluster is also automatically started when a job is submitted for the cluster.

Submit a job

One of the simplest ways to run a spark job is to submit it through the workbench. You can navigate to the workbench from the drop-down menu at the top left corner. In the workbench, click on “+Create New”. Then select “Spark” next to the title of the job. Once you select Spark, an optional drop-down appears where you can choose “Python”. In the last drop-down menu, select the spark cluster where you want to execute the job. If this cluster is not active, it will be activated automatically. Enter your spark job in the window below. When complete, click on “Run” to run the job.

Airflow Cluster

Airflow scheduler can be used to run various jobs in a sequence. Let’s take a look at configuring an Airflow cluster in Qubole.

Setting up DataStore

The first step in creating an airflow cluster is to set up a datastore. Make sure that the MySQL db is up and running and contains a database for airflow. Now, select “Explore” from the dropdown menu at the top left corner. On the left hand menu, drop down the selection menu showing “Qubole Hive” and select “+Add Data Store”

In the new screen, provide a name for the data store. Select “MySQL” as the database type. Enter the database name for the airflow database (The database should already be created in MySQL). Enter the host address as “hmklabsbienvironment.cq8z1kp7ikd8.us-west-2.rds.amazonaws.com”. Enter the username and password. Make sure to select “Skip Validation”. Since the MySQL db is in a private VPC, Qubole does not have access to it and will not be able to validate.

Configuring Airflow Cluster

Step 1: Click on the top left drop-down menu and select “Cluster”

Step 2: Click on “+New” button

Step 3: Select “Airflow” in the type of cluster and click “Next”

Step 4: Give a cluster name. Select the airflow version, node type.

Step 5: Select the datastore which points to the MySQL

Step 6: Select the us-west-2 as the Region

Step 7: Select us-west-2a as the Availability zone

Step 8: Click next to go to Advanced Configuration

Step 9: Select AppVPC as the VPC

Step 10: Select AppPrivateSNA as the Subnet

Step 11: Enter the Bastion Node information

Step 12: Scroll to the bottom and enter AppQuboleClusterSG as the Persistent Security Groups

Step 13: Click on create

Once the cluster is created, you can run it by clicking on “Start” next to the cluster’s name.

Containers are being embraced at a breakneck speed – developers love them, and they are great for business because they deliver speed and scale in a cost-efficient manner. So much so, that container technology seems to be overtaking VMs – especially with container orchestration tools like Kubernetes, making them simpler to manage and extracting higher efficiency and speed from them.

Kubernetes cluster architecture

Kubernetes provides an open-source platform for simplifying multi-cloud environments. The disparities between different cloud providers are a roadblock for developers and Kubernetes helps by streamlining and standardizing container-based applications.

Kubernetes clusters are the architectural foundation that drives this simplicity and makes it possible for users to get the functionality they need at scale and with ease. Here are some of the functionalities of Kubernetes –

  • Kubernetes distributes workload efficiently across all open resources and reduces traffic spikes or outages.
  • It simplifies application deployment regardless of the size of the cluster
  • It automates horizontal scaling
  • It monitors against app failure with constant node and container health checks and performs self-healing and replication to resolve any failure issues.

All this makes the work of developers faster and frees up their time and attention from trivial repetitive tasks allowing them to build applications better and faster! For the organization, the benefits are three-fold – higher productivity, better products and, finally, cost efficiencies.

Let’s move to the specifics now and find out how to set up a Kubernetes Cluster on the RHEL 7.6 operating system on AWS.

Prerequisites:
  • You should have a VPC available.
  • A subnet within that VPC, into which you will place your cluster.
  • You should have Security Groups for the Control Plane Load Balancer and the Nodes created.
  • You should have created the Control Plane Load Balancer.
  • A bastion host, or jump box, with a public IP within your VPC from which you can secure shell into your VMs.
  • A pem file for your AWS region, which you will use to secure shell into your VMs.
Creating the IAM Roles

You will need to create 2 IAM roles: one for the Master(s), and one for the worker nodes.

Master Role

To create an IAM role, go to the IAM (Identity and Access Management) page in the AWS console. On the left-hand menu, click ‘Roles’. Then click ‘Create Role’.

Select the service that will use this role. By default, it is EC2, which is what we want. Then click “Next: Permissions”.

Click ‘Create Policy’. The Create Policy page opens in a new tab.

Click on the ‘JSON tab’. Then paste this json into it:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "ec2:*",
                "elasticloadbalancing:*",
                "ecr:GetAuthorizationToken",
                "ecr:BatchCheckLayerAvailability",
                "ecr:GetDownloadUrlForLayer",
                "ecr:GetRepositoryPolicy",
                "ecr:DescribeRepositories",
                "ecr:ListImages",
                "ecr:BatchGetImage",
                "autoscaling:DescribeAutoScalingGroups",
                "autoscaling:UpdateAutoScalingGroup"
            ],
            "Resource": "*",
            "Effect": "Allow"
        }
    ]
}

This json defines the permissions that your master nodes will need.

Click ‘Review Policy’. Then give your policy a name and a description.

Click ‘Create Policy’ and your policy is created!

Back on the Create Role page, refresh your policy list, and filter for the policy you just created. Select it and click ‘Next: Tags’.

You should add 2 tags: Name, with a name for your role, and KubernetesCluster, with the name of the cluster that you are going to create. Click ‘Next: Review’.

Give your role a name and a description. Click ‘Create Role’ and your role is created!

Node Role

For the node role, you will follow similar steps, except that you will use the following json:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "ec2:Describe*",
                "ecr:GetAuthorizationToken",
                "ecr:BatchCheckLayerAvailability",
                "ecr:GetDownloadUrlForLayer",
                "ecr:GetRepositoryPolicy",
                "ecr:DescribeRepositories",
                "ecr:ListImages",
                "ecr:BatchGetImage"
            ],
            "Resource": "*",
            "Effect": "Allow"
        }
    ]
}
Provisioning the VMs
Provisioning the Master

We will use RHEL 7.6 for our cluster because RHEL 8.0 uses iptables v1.8, and kube-proxy does not work well with iptables v1.8. However, kube-proxy works with iptables v1.4, which is installed on RHEL 7.6. We will use the x86_64 architecture.

Log into the AWS console. Go to the EC2 home page and click ‘Launch Instance’. We will search under Community AMIs for our image.

Click ‘Select’. Then choose your instance type. T2.medium should suffice for a Kubernetes master. Click ‘Next: Configure Instance Details’.

We will use only 1 instance. For an HA cluster, you will want more. Select your network and your subnet. For the purposes of this tutorial, we will enable auto-assigning a public IP.  In production, you would probably not want your master to have a public IP.  But you would need to make sure that your subnet is configured correctly with the appropriate NAT and route tables. Select the IAM role you created. Then click ‘Next: Add Storage’.

The default, 10 GB of storage, should be adequate for a Kubernetes master. Click ‘Next: Add Tags’.

We will add 3 tags: Name, with the name of your master; KubernetesCluster, with the name of your cluster; and kubernetes.io/cluster/<name of your cluster>, with the value owned. Click ‘Next: Configure Security Group’.

Select “Select an existing security group” and select the security group you created for your Kubernetes nodes. Click ‘Review and Launch’.

Click ‘Launch’. Select “Choose an Existing Key Pair”. Select the key pair from the drop-down. Check the “I acknowledge” box. You should have the private key file saved on the machine from which you plan to secure shell into your master; otherwise you will not be able to ssh into the master! Click ‘Launch Instances’ and your master is created.

Provisioning the Auto Scaling Group

Your worker nodes should be behind an Auto Scaling group. Under Auto Scaling in the left-hand menu of the AWS console, click ‘Auto Scaling Groups’. Click ‘Create Auto Scaling Group’. On the next page, click ‘Get Started’.

Under “Choose AMI”, select RHEL 7.6 x86_64 under Community AMIs, as you did for the master.

When choosing your instance type, be mindful of what applications you want to run on your Kubernetes cluster and their resource needs. Be sure to provision a size with sufficient CPUs and memory.

Under “Configure Details”, give your autoscaling group a name and select the IAM role you configured for your Kubernetes nodes.

When selecting your storage size, be mindful of the storage requirements of your applications that you want to run on Kubernetes. A database application, for example, would need plenty of storage.

Select the security group that you configured for Kubernetes nodes.

Click ‘Create Launch Configuration’. Then select your key pair as you did for the master. Click ‘Create Launch Configuration’ and you are taken to the ‘Configure Auto Scaling Group Details’ page. Give your group a name. Select a group size. For our purpose, 2 nodes will suffice. Select the same subnet on which you placed your master. Click ‘Next: Configure Scaling policies’.

For this tutorial, we will select “Keep this group at its initial size”. For a production cluster with variability in usage, you may want to use scaling policies to adjust the capacity of the group. Click ‘Next: Configure Notifications’.

We will not add any notifications in this tutorial. Click ‘Next: Configure Tags’.

We will add 3 tags: Name, with the name of your nodes; KubernetesCluster, with the name of your cluster; and kubernetes.io/cluster/<your cluster name>, with the value owned. Click ‘Review’.

Click Create Auto Scaling Group and your auto-scaling group is created!

Installing Kubernetes

Specific steps need to be followed to install Kubernetes. Run the following steps as sudo on your master(s) and worker nodes.

 # add docker repo

yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo

# install container-selinux

 yum install -y http://mirror.centos.org/centos/7/extras/x86_64/Packages/container-selinux-2.107-1.el7_6.noarch.rpm

# install docker

yum install docker-ce

# enable docker

systemctl enable --now docker

# create Kubernetes repo. The 2 urls after gpgkey have to be on 1 line.

cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
EOF

# configure selinux

setenforce 0
sed -i 's/^SELINUX=enforcing$/SELINUX=permissive/' /etc/selinux/config

# install kubelet, kubeadm, kubectl, and Kubernetes-cni. We found that version 1.13.2 works well with RHEL 7.6.

yum install -y kubelet-1.13.2 kubeadm-1.13.2 kubectl-1.13.2 kubernetes-cni-0.6.0-0.x86_64 --disableexcludes=kubernetes –nogpgcheck

# enable kubelet

systemctl enable --now kubelet

# Run the following command as a regular user.

sudo usermod -a -G docker $USER
Creating the Kubernetes Cluster

First, add your master(s) to the control plane load balancer as follows. Log into the AWS console, EC2 service, and on the left-hand menu, under Load Balancing, click ‘Load Balancers’. Select your load balancer and click the Instances tab in the bottom window. Click ‘Edit Instances’.

Select your master(s) and click ‘Save’.

We will create the Kubernetes cluster via a config file. You will need a token, the master’s private DNS name taken from the AWS console, the Load Balancer’s IP, and the Load Balancer’s DNS name. You can generate a Kubernetes token by running the following command on a machine on which you have installed kubeadm:

kubeadm token generate

To get the load balancer’s IP, you must execute a dig command. You install dig by running the following command as sudo:

yum install bind-utils

Then you execute the following command:

dig +short <load balancer dns>

Then you create the following yaml file:

 ---
 apiVersion: kubeadm.k8s.io/v1beta1
 kind: InitConfiguration
 bootstrapTokens:
 - groups:
   - "system:bootstrappers:kubeadm:default-node-token"
   token: "<token>"
   ttl: "0s"
   usages:
   - signing
   - authentication
 nodeRegistration:
   name: "<master private dns>"
   kubeletExtraArgs:
     cloud-provider: "aws"
 ---
 apiVersion: kubeadm.k8s.io/v1beta1
 kind: ClusterConfiguration
 kubernetesVersion: "v1.13.2"
 apiServer:
   timeoutForControlPlane: 10m0s
   certSANs:
   - "<Load balancer IPV4>"
   extraArgs:
     cloud-provider: "aws"
 clusterName: kubernetes
 controlPlaneEndpoint: "<load balancer DNS>:6443"
 controllerManager:
   extraArgs:
     cloud-provider: "aws"
     allocate-node-cidrs: "false"
 dcns:
   type: CoreDNS 

You then bootstrap the cluster with the following command as sudo:

kubeadm init --config kubeadm.yaml --ignore-preflight-errors=all

I had a timeout error on the first attempt, but the command ran successfully the second time. Make a note of the output because you will need it to configure the nodes.

You then configure kubectl as follows:

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

After this there are some components that need to be installed on Kubernetes on AWS:

# Grant the “admin” user complete access to the cluster

kubectl create clusterrolebinding admin-cluster-binding --clusterrole=cluster-admin --user=admin

# Add-on for networking providers, so pods can communicate. 
# Currently either calico.yaml or weave.yaml

kubectl apply -f https://aws-quickstart.s3.amazonaws.com/quickstart-vmware/scripts/weave.yaml

# Install the Kubernetes dashboard

kubectl apply -f https://aws-quickstart.s3.amazonaws.com/quickstart-vmware/scripts/dashboard.yaml

# Install the default StorageClass

kubectl apply -f https://aws-quickstart.s3.amazonaws.com/quickstart-vmware/scripts/default.storageclass.yaml

# Set up the network policy blocking the AWS metadata endpoint from the default namespace.

kubectl apply -f https://aws-quickstart.s3.amazonaws.com/quickstart-vmware/scripts/network-policy.yaml

Then you have to configure kubelet arguments:

sudo vi /var/lib/kubelet/kubeadm-flags.env

And add the following parameters:

--cloud-provider=aws --hostname-override=<the node name>

After editing the kubeadm-flags.env file:

sudo systemctl restart kubelet

Finally, you have to label your master with the provider ID. That way, any load balancers you create for this node will automatically add the node as an AWS instance:

kubectl patch node <node name> -p '{"spec":{"providerID":"aws:///<availability zone>/<instance ID>"}}'

You can join worker nodes to the cluster by running the following command as sudo, which should have been printed out after running kubeadm init on the master:

kubeadm join <load balancer dns>:6443 --token <token> --discovery-token-ca-cert-hash <discovery token ca cert hash> --ignore-preflight-errors=all

Be sure to configure kubelet arguments on each node and patch them using kubectl as you did for the master.

Your Kubernetes cluster on AWS is now ready!

As one of the most popular cloud platforms, Microsoft Azure is the backbone of thousands of businesses – 80% of the Fortune 500 companies are on Microsoft cloud, and Azure holds 31% of the global cloud market! 

Microsoft’s customer-centricity shines through the entire Azure stack, and a critical part of it is the Azure Alerts that allows you to monitor the metrics and log data for the whole stack across your infrastructure, application, and Azure platform.

Azure Alerts offers organizations and IT managers, access to faster alerts and a unified monitoring platform. Once set up, the software requires minimal technical effort and gives the IT team a centralized monitoring experience through a single dashboard that manages ALL the alerts.

The platform is designed to provide low latency log alerts and metric alerts which gives IT managers the opportunity to identify and fix production and performance issues almost in real-time. Naturally, in complex IT environments, this level of control and overview of the IT infrastructure leads to higher productivity and reduced costs.

Here are more details of how Azure Alerts work

Alerts proactively notify us when important conditions are found in your monitoring data. They allow us to identify and address issues before the users notice them.

This diagram represents the flow of alerts

Alerts can be created from

  • Metric values of resources
  • Log search queries results
  • Activity log events
  • Health of the underlying Azure platform

This is what a typical alert dashboard for a single/multiple subscriptions looks like

You can see 5 entities on the dashboard
  • Severity
    • Defines how severe the alert is and how quickly action needs to be taken.
  • Total alerts
    • Total number of alerts received aggregated by the severity of the alert.
  • New
    • The issue has just been detected and hasn’t yet been reviewed.
  • Acknowledged
    • An administrator has reviewed the alert and started working on it.
  • Closed
    • The issue has been resolved. After an alert has been closed, you can reopen it by changing it to another state.

We will now take you through the steps to create Metric Alerts, Log Search Query Alerts, Activity Log Alerts, and Service Health Alerts.

STEPS TO CREATE A METRIC ALERT

Go to Azure monitor. Click ‘alerts’ found on the left side.

To create a new alert, click on the ‘+ New alert rule’.

After clicking ‘+ New alert rule’ this window will appear.

To select a resource, click ‘select’. It will display this window where you can select the resource by filtering the subscription, and resource type and the location of the resource. Then select ‘Done’ in the bottom.

Once the resource is selected, now configure the condition. Click ‘select’ to configure the signal. The signal type will show both metrics and activity log for the selected resource.

Select the signal for which you need to create the alert, after selecting the signal, a new consecutive window is displayed, where you need to describe the alert logic.

Set the threshold sensitivity above which you need to trigger the alert. Setting the threshold sensitivity is applicable for static threshold only.

For dynamic threshold, the value is determined by continuously learning the data of the metric series and trying to model it using a set of algorithms and methods. It detects patterns in the data such as seasonality (Hourly / Daily / Weekly) and can handle noisy metrics (such as machine CPU or memory) as well as metrics with low dispersion (such as availability and error rate).

Now select an ‘action group’ if you already have one or create a new action group.

  • Provide a name for the action group.
  • Select the subscription and resource group where the action group needs to be deployed.
  • If you have selected the action type as Email/SMS/Push/Voice, that will display another window to configure the necessary details like email ID, contact number for SMS and voice notifications, etc., provide the information and select OK.
  • You can see the different action types available in the image below.

Input the alert details, alert rule name, description of the alert, and severity of the rule. Select ‘enable rule upon creation’.

Finally click ‘create alert rule’. It might take some time to create the alert and for it to start working.

HOW TO CREATE LOG SEARCH QUERY ALERTS

Repeat steps 1 to 3 as outlined in the Metric alert creation. In step 3 select the resource type as “log analytics workspace”.

Now select the condition, you can choose the “Log (saved query)” or select “Custom log search”.

Select the signal name as per your requirements; a new signal window will be displayed containing the attributes corresponding to the selected signal.

Here we have selected a saved query, which provides the result shown as above.

  1. Rule created based on “Number of results” and the threshold provided
  2. Metric measurement and the threshold provided. The Trigger alert based on,
    • Total breaches or
    • Continuous breaches of the threshold provided in the metric measurement.

Provide the evaluation based on time and the frequency in minutes where the alert rule needs to be monitored.

Follow the steps 8 and 9 as outlined in the Metric alert creation.

STEPS TO CREATE ACTIVITY LOG ALERTS

Repeat steps 1 to 3 as outlined in the Metric alert creation. In step 3 select the resource type as “log analytics workspace”.

On selecting the condition, click ‘Monitor Service’ and select the activity log-Administrative.

Here we have selected, all administrative operations as the signal.

Now configure the alert logic. The event level has many types. Select as per your requirement and click Done in the bottom.

Follow the steps 8 and 9 as outlined in the Metric alert creation.

STEPS TO CREATE A SERVICE HEALTH ALERT

You can receive an alert when Azure sends service health notifications to your Azure subscription. You can configure the alert based on:

  • The class of service health notification (Service issues, Planned maintenance, Health advisories).
  • The subscription affected.
  • The service(s) affected.
  • The region(s) affected.

Login into Azure portal, search for service health if it is on the left side. Click service health.

You can see the service health service is now visible and select the “Health alerts” in the alerts section.

Select Create service health alert and fill in the fields.

Select the subscription and services for which you need to be alerted.

Select the ‘region’ where your resources are located and select the ‘Event type’. Azure provides the following event types,

Select “all the event types” so that you can receive alerts irrespective of the event type.

Follow the steps 8 and 9 as outlined in the Metric alert creation. Then click on ‘Create Alert rule’. The service health alert can be seen in the Health Alerts section.

Kubernetes is the reigning market leader when it comes to container orchestration! Any organization working with the container ecosystem is either already using Kubernetes or considering it. However, despite the undoubted ease and speed Kubernetes bring to the container ecosystem, they also need specialized expertise to deploy and manage.

Many organizations consider the DIY approach to Kubernetes and if you have an in-house IT team with the requisite experience or if your requirements are large enough to justify the cost of hiring a dedicated Kubernetes team – then an internal Kubernetes strategy could certainly be beneficial.

However, if you don’t fall in the category mentioned above, then managed Kubernetes is the smartest and most cost-effective way ahead. With professionals in the picture, you can be assured of getting long term strategy, seamless implementation, and dedicated on-going service, which will

  • reduce deployment time
  • provide 24×7 support
  • handle all upgrades and fixes
  • troubleshoot as and when needed

Kubernetes solution providers offer a wide range of services – from fully managed to bare bone implementation to preconfigured Kubernetes environments on SaaS models to training for your in-house staff.

Look at your operational needs and your budget and explore the market for Kubernetes services options before you pick the service and the digital partner that ticks all your boxes.   

Meanwhile, do look at our tutorial on troubleshooting Kubernetes deployments.

Kubernetes deployments issues are not always easy to troubleshoot. In some cases, the errors can be resolved easily, and, in some cases, detecting errors requires us to dig deeper and run various commands to identify and resolve the issues.

The first step is to list down all pods after installing your application. The following command lists down all pods in all namespaces.

kubectl get pods -A

If you find any issues on the pod status, you can then use kubectl describe, kubectl logs, kubectl exec commands to get more detailed information.

Debugging Pods
Pod Status Shows ImagePullBackOff or ErrImagePull

This status indicates that your pod could not run because the pod could not pull the image from the container registry. To confirm this, run the kubectl describe command along with the pod identifier to display the details.

kubectl describe pod <pod-identifier>

This command will provide more information about the issue.

  • Image name or tag incorrect.
    • Check the image name and tag and try to pull the image manually on the host using docker pull to verify.
  • Authentication failure related to Container registry.
    • Check the secrets, roles, service principal related to your container registry and try to pull the image manually on the host using docker pull to verify.
docker pull <image-name:tag> 
Pod Status Shows Waiting

This status indicates your pod has been scheduled to a worker node, but it can’t run on that machine. To confirm this, run the kubectl describe command along with the pod identifier to display the details.

kubectl describe pod <pod-identifier> -n <namespace>

The most common causes related to this issue are

  • Image name or tag incorrect.
    • Check the image name and tag and try to pull the image manually on the host using docker pull to verify.
  • Authentication failure related to Container registry.
    • Check the secrets, roles, service principal related to your container registry and try to pull the image manually on the host using docker pull to verify.
Pod Status Shows Pending or CrashLoopBackOff

This status indicates your pod could not be scheduled on a node for various reasons like resource constraints (insufficient CPU or memory resources), volume mounting issues.  To confirm this, run the kubectl describe command along with the pod identifier to display the details.

kubectl describe pod <pod-identifier> -n <namespace>

This command will provide more information about the issue. Most common issues are

  • Insufficient resources
    • If resources are insufficient, clean up your existing resources or scaling your nodes (vertically or horizontally) to increase the resources.
  • Volume mounting
    • Check your volume’s mounting definition and storage classes.
  • Using hostPort
    • When you bind a Pod to a hostPort, there are a limited number of places that a pod can be scheduled. In most cases, hostPort is unnecessary, try using a Service object to expose your pod. If you do require hostPort, then you can only schedule as many Pods as there are nodes in your Kubernetes cluster
Pod is crashing or unhealthy

Sometimes the scheduled pods are crashing or unhealthy.  Run kubectl logs to find the root cause.

kubectl logs <pod_identifier> -n <namespace>

If you have multiple containers, run the following command to find the root cause.

kubectl logs <pod_identifier> -c <container_name> -n <namespace>

If your container has previously crashed, you can access the previous container’s crash log with:

kubectl logs –previous <pod_identifier> -c <container_name> -n <namespace>

If your pod is running but with 0/1 ready state or 0/2 ready state (in case if you have multiple containers in your pod), then you need to verify the readiness. Check the health check (readiness probe) in this case.

Most common issues are

  • Application issues
    • Run the below command to check the logs.
               kubectl logs <pod_identifier> -c <container_name> -n <namespace>
  • Run the below command to verify the events.
               kubectl describe <pod_identifier> -n <namespace>
  • Readiness probe health check failed
    • Check the health check (readiness probe) in this case. Also, check the READY column of the kubectl get pods output to find out if the readiness probe is executing positively.
    • Run the below command to check the logs.
         kubectl logs <pod_identifier> -c <container_name> -n <namespace>
  • Run the below command to verify the events.
         kubectl describe <pod_identifier> -n <namespace>
  • Liveness probe health check failed
    • Check the health check (liveness probe) in this case. Also, check the RESTARTS column of the kubectl get pods output. To find out if the liveness probe is executing positively.
    • Run the below command to check the logs.
         kubectl logs <pod_identifier> -c <container_name> -n <namespace>
  • Run the below command to verify the events.
         kubectl describe <pod_identifier> -n <namespace>
Pod is running but has application issues

In some cases, the pods are running, but the output of the application is incorrect. In this case, you should run the following to find the root cause.

  • Run the below command and identify the issue.
kubectl logs <pod_identifier> -c <container_name> -n <namespace>
  • If you are interested in the last n lines of logs run
kubectl logs <pod_identifier> -c <container_name> --tail <n-lines> -n <namespace>
  • Run the commands inside the container using
kubectl exec <pod_identifier> -c <container_name> /bin/bash -n <namespace>

Run the commands like ‘curl’ or ‘ps’ ‘ls’ to troubleshoot the issue after you get into the container.

Pod is running and working but cannot access through services

In some cases, the pods are working as expected but cannot access through the services. Most common causes of this issue are

  • Service not registered properly
    • Check that the service exists and describe the service and validate the pod selectors to run the following commands.
kubectl get svc
kubectl describe svc <svc-name>
kubectl get endpoints
  • Run the following commands to verify pod selector
kubectl get pods --selector=name={name},{label-name}={label-value}
  • The service may be deployed in a different namespace.
    • Verify that the pod’s containerPort matches up with the Service’s containerPort
  • Service is registered properly but has a DNS issue
    • Get into the container using exec command and run nslookup using the following command
kubectl get endpoints
kubectl exec <pod_identifier> -c <container_name> /bin/bash
nslookup <service-name>
  • If you have any issues to run the command for curl or nslookup. Deploy debugging pod using image yauritux/busybox-curl in the same namespace to verify. Please run the following commands to verify
kubectl run --generator=run-pod/v1 -it --rm <name> --image=yauritux/busybox-curl -n <namespace>
  • Run the following to verify within the container
curl http://<servicename>
telnet <service-ip> <service-port>
nslookup <servicename>

On July 21, 2015, when Kubernetes v1.0 was released, it redefined the container technology landscape. All the bottlenecks of application deployment, scaling, and management in containers was made simpler and faster with intelligent automation.

Container technology made software development more agile and brought in resource efficiency – they made scaling smoother and faster. However, they also need to be tracked, monitored, and managed, which is where container orchestration and Kubernetes come in.

Kubernetes is an open-source container-orchestration system for automating application deployment, scaling, and management.  It was originally designed by Google and is now maintained by the Cloud Native Computing Foundation.

What does Kubernetes do?

Kubernetes allows you to leverage the full potential of your container ecosystem. With automation, it streamlines container workflow and frees up the IT team to concentrate on their core areas of application development by removing the need to manage container networking, storage, logs, alerting, etc. Overall, it automates deploying, scaling, and managing of containerized applications on a cluster of servers.

Key Benefits of Kubernetes

Flexibility for scaling – it enables horizontal infrastructure scaling by quickly adding or removing new severs. Kubernetes has the option of automating vertical scaling, too, by taking into account application provided metrics.

Health check and self-healing designed in Kubernetes allow it to maintain high availability of applications and infrastructure.

Enhanced deployment speed – with automated rollouts and rollbacks, canary deployments, and wide-ranging support for a variety of programming languages, Kubernetes speeds up the process of building, testing, and deploying new software.

Let’s understand more about Kubernetes concepts

1. Kubernetes Objects

Kubernetes contains several abstractions that represent the state of our system: deployed containerized applications and workloads, their associated network and disk resources, and other information about what our Kubernetes cluster is doing. These abstractions are represented by objects in the Kubernetes API.  The basic Kubernetes objects include:

  • Pod
  • Service
  • Volume
  • Namespace

In this blog, we will look at the Pod and Service objects.

2. Pods

A pod is a higher level of abstraction grouping containerized components.  A pod consists of one or more containers that are guaranteed to be co-located on the host machine and can share resources.  The basic scheduling unit in Kubernetes is a pod.  The host machines on which the pods are scheduled are called Nodes.

3. Pod definition yaml

Kubernetes objects are mostly created by declaring their configuration in a yaml file.
Given below is yaml file to define a simple pod.

apiVersion: v1 
kind: Pod
metadata: 
name: nginx 
labels: 
name: nginx 
spec: 
containers: 
- name: nginx 
image: nginx 
ports: 
- containerPort: 8080

In the above yaml file,

  1. apiVersion – denotes which version of the Kubernetes API we are using to create this object.
  2. kind – specifies what kind of object we want to create.  For Pod object, the apviVersion is always v1.
  3. metadata – has data to uniquely identify the object (name) and labels.
  4. Labels are key/value pairs that are attached to objects.  Labels are intended to be used to specify identifying attributes of objects that are meaningful and relevant to users but do not directly imply semantics to the core system.  So, in the above example, instead of “name: nginx” we can have “appname: nginx”, “name: mynginxapp” or anything we like.
  5. Spec – defines the object specification and differs for each object type.  For Pod object, the spec has an array of containers since a pod consists of one or more containers.
  6. For each container, we provide below attributes:
  • name – name of the container.  This can be different from name of pod and is not related to it.
  • Image – name of the docker image to be used to build this container
  • ports – the ports in this container to be exposed outside the pod.  Here we are running the nginx web-server on port 8080 and exposing it.

Suppose we have the above pod definition in a file named pod-definition.yaml, the pod is created by executing the below Kubernetes command:

$ kubectl create -f pod-definition.yaml

4. Pod communication and need for services

Each pod in Kubernetes is assigned a unique Pod IP address within the cluster, which allows applications to use ports without the risk of conflict. 

Within the pod, all containers can reference each other on localhost, but a container within one pod has no way of directly addressing another container within another pod; for that, it must use the Pod IP Address.

An application developer should never use the Pod IP Address though, to reference / invoke a capability in another pod, as Pod IP addresses are ephemeral – the specific pod that they are referencing may be assigned to another Pod IP address on restart.  Instead, we should use a reference to a Service, which holds a reference to the target pod at the specific Pod IP Address.

5. Services

In Kubernetes, a Service is an abstraction that defines a logical set of Pods and a policy by which to access them. The set of Pods targeted by a Service is usually determined by a selector.

Sample YAML for a service to expose the pod(s) which we created earlier is given below:

apiVersion: v1
 kind: Service
 metadata:
   name: my-service
 spec:
   selector:
     name: nginx
   ports:
     - protocol: TCP
       port: 80
       targetPort: 8080
       nodePort: 30230
   type: NodePort 

This yaml file defines a service named my-service which is used to access the Pods which have a label ‘name: nginx’. 

  1. The selector field of the service must match the label field of the Pods to which we want to connect.
  2. There are 3 ports defined in above YAML file:
  • Port is the port number that makes a service visible to other services running within the same Kubernetes cluster.
  • Target Port is the port on the POD where the service is running.  This is an optional field; if not provided, Kubernetes assigns the same value as Port field
  • Node port is the port on which the service can be accessed by external users.  NodePort can only have values from 30000 to 32767.  If this optional field is not provided in the definition, Kubernetes automatically assigns a value for NodePort service.

To create the service object, enter the above yaml code in a file named service-defn.yaml and execute the command given below:

$ kubectl create -f service-defn.yaml

6. Types of Services

In the above example, we have type: Nodeport for the service.  The different values allowed for the type field are:

  1. ClusterIP: Exposes the Service on a cluster-internal IP. Choosing this value makes the Service only reachable from within the cluster. This is the default Service type.
  2. NodePort: Exposes the Service on each Node’s IP at a static port (the NodePort).   A ClusterIP Service, to which the NodePort Service routes, is automatically created.  We will be able to contact the NodePort Service, from outside the cluster, by requesting <NodeIP>:<NodePort>
  3. LoadBalancer: Exposes the Service externally using a cloud provider’s load balancer. NodePort and ClusterIP Services, to which the external load balancer routes, are automatically created.
  4. ExternalName: Maps the Service to the contents of the externalName field (e.g., foo.bar.example.com), by returning a CNAME record with its value.  No proxying of any kind is set up.  We need CoreDNS version 1.7 or higher to use the ExternalName type.

Many of you are running your mission-critical applications on containers, and if you haven’t already deployed Kubernetes to manage your container ecosystem, then chances are you soon will.

If you are considering a Kubernetes implementation, then there are several ways to go about it –

  • In-house Kubernetes deployment – if you have a large enough IT team with the requisite expertise in Kubernetes architecture and deployment, then getting your Kubernetes cluster up and running in-house is certainly a possibility. Kubernetes deployment is a complex process and requires a mix of specific skill sets. Also running and monitoring a Kubernetes platform requires the full-time services of a dedicated team, and your requirement must justify this additional cost.
  • SaaS Solutions for Kubernetes– if your business needs are specific and straightforward, then you can explore the market for pre-designed Kubernetes offerings on a SaaS payment model.
  • Fully outsourced (managed) Kubernetes services – if budget permits and your business demands, then bringing in the professionals is a safe and hassle-free solution. From infrastructure assessments to building a Kubernetes strategy to engineering, deploying, and managing enterprise-wide Kubernetes solutions – you can outsource your entire project to experts.
  • Many service providers like CloudIQ also offer day-to-day management and support as well as Kubernetes training to your IT staff to set up internal management expertise.

If the last decade of cloud has taught us anything, it is that when it comes to technology, bringing in professionals to do the job always turns out to be the best option in the long run. Kubernetes is a sophisticated platform that requires specialized competencies. Here is a look at one of our tutorials on Kubernetes Networking – how it all works under the hood.

KUBERNETES NETWORKING – DATA PLANE

In Kubernetes, applications run as a set of pods with their own IP address and port. Kubernetes provides an abstract way to expose the applications/pods as a network service. Various forms of the service abstractions include ClusterIP, NodePort, Load Balancer & Ingress. When service requests enters Kubernetes cluster, the service abstractions have to be directed to individual service endpoints of Pods. This data plane function is implemented using a Linux  Kernel feature – iptables.

Iptables is used to set up, maintain, and inspect the tables of IP packet filter rules in the Linux kernel. Several different tables may be defined. Each table contains a number of built-in chains and may also contain user-defined chains. Each chain is a list of rules which can match a set of packets. Each rule specifies what to do with a packet that matches. This is called a ‘target’, which may be a jump (-j) to a user-defined chain in the same table.

The service(SVC) to service endpoints(SEP) are programmed using KUBE-SERVICES user-defined chains in the NAT(Network Address Translation) table. The contents of the iptables can be extracted using “iptables-save” command

# Generated by iptables-save v1.6.0 on Mon Sep 16 08:00:17 2019
*nat
:PREROUTING ACCEPT [1:52]
:INPUT ACCEPT [0:0]
:OUTPUT ACCEPT [23:1438]
:POSTROUTING ACCEPT [10:592]
:DOCKER - [0:0]
:IP-MASQ-AGENT - [0:0]

:KUBE-SERVICES - [0:0]

-A PREROUTING -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A OUTPUT -m comment --comment "kubernetes service portals" -j KUBE-SERVICES

Let’s consider the Services in the following example.

cloudiq@hubandspoke:~$ kubectl get svc –namespace=workshop-development

NAMEciq-ingress-workshop-development-nginx-ingress-controller
TYPELoadBalancer
CLUSTER-IPhttp://192.168.5.65
EXTERNAL-IPhttp://10.82.0.97
PORT(S)80:30512/TCP,443:31512/TCP
AGE19h

Here we have the following service abstractions that are defined.

LoadBalancerIP=10.82.0.97

NodePort=30512/31512

ClusterIP=192.168.5.65

The above services have to be translated to individual service endpoints. The rules performing matching and translation are programmed using custom chains in the NAT table of Ip Tables as below.

Lets look for LoadBalancer=10.82.0.97 service

cloudiq@hubandspoke:~$ cat ciq-dev-aks-iptables-save.output | grep 10.82.0.97
-A KUBE-SERVICES -d 10.82.0.97/32 -p tcp -m comment --comment "workshop-development/ciq-ingress-workshop-development-nginx-ingress-controller:http loadbalancer IP" -m tcp --dport 80 -j KUBE-FW-SXB4UOYSLPHVISJM
-A KUBE-SERVICES -d 10.82.0.97/32 -p tcp -m comment --comment "workshop-development/ciq-ingress-workshop-development-nginx-ingress-controller:https loadbalancer IP" -m tcp --dport 443 -j KUBE-FW-JLRSZDR3OXJ4SUA2

Let’s look at the HTTPS service available on port 443.

cloudiq@hubandspoke:~$ cat ciq-dev-aks-iptables-save.output | grep KUBE-FW-JLRSZDR3OXJ4SUA2

cloudiq@hubandspoke:~$ cat ciq-dev-aks-iptables-save.output | grep KUBE-FW-JLRSZDR3OXJ4SUA2

:KUBE-FW-JLRSZDR3OXJ4SUA2 - [0:0]
-A KUBE-FW-JLRSZDR3OXJ4SUA2 -m comment --comment "workshop-development/ciq-ingress-workshop-development-nginx-ingress-controller:https loadbalancer IP" -j KUBE-MARK-MASQ
-A KUBE-FW-JLRSZDR3OXJ4SUA2 -m comment --comment "workshop-development/ciq-ingress-workshop-development-nginx-ingress-controller:https loadbalancer IP" -j KUBE-SVC-JLRSZDR3OXJ4SUA2
-A KUBE-FW-JLRSZDR3OXJ4SUA2 -m comment --comment "workshop-development/ciq-ingress-workshop-development-nginx-ingress-controller:https loadbalancer IP" -j KUBE-MARK-DROP
-A KUBE-SERVICES -d 10.82.0.97/32 -p tcp -m comment --comment "workshop-development/ciq-ingress-workshop-development-nginx-ingress-controller:https loadbalancer IP" -m tcp --dport 443 -j KUBE-FW-JLRSZDR3OXJ4SUA2

We see below NodePort & Cluster IP translation. The service chains SVC point to two different service endpoints. In order to select between the two service endpoints, a random probability measure is calculated, and appropriate SEP service endpoints are selected.

cloudiq@hubandspoke:~$ cat ciq-dev-aks-iptables-save.output | grep KUBE-SVC-JLRSZDR3OXJ4SUA2

:KUBE-SVC-JLRSZDR3OXJ4SUA2 - [0:0]
-A KUBE-FW-JLRSZDR3OXJ4SUA2 -m comment --comment "workshop-development/ciq-ingress-workshop-development-nginx-ingress-controller:https loadbalancer IP" -j KUBE-SVC-JLRSZDR3OXJ4SUA2
-A KUBE-NODEPORTS -p tcp -m comment --comment "workshop-development/ciq-ingress-workshop-development-nginx-ingress-controller:https" -m tcp --dport 31512 -j KUBE-SVC-JLRSZDR3OXJ4SUA2
-A KUBE-SERVICES -d 192.168.5.65/32 -p tcp -m comment --comment "workshop-development/ciq-ingress-workshop-development-nginx-ingress-controller:https cluster IP" -m tcp --dport 443 -j KUBE-SVC-JLRSZDR3OXJ4SUA2
-A KUBE-SVC-JLRSZDR3OXJ4SUA2 -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-4R3FOXQSM5T2ZADC
-A KUBE-SVC-JLRSZDR3OXJ4SUA2 -j KUBE-SEP-PI7R3ONIYH4XJLMW

In the SEP service endpoints, the actual DNAT is performed.

cloudiq@hubandspoke:~$ cat ciq-dev-aks-iptables-save.output | grep KUBE-SEP-4R3FOXQSM5T2ZADC
:KUBE-SEP-4R3FOXQSM5T2ZADC - [0:0]
-A KUBE-SEP-4R3FOXQSM5T2ZADC -s 10.82.0.10/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-4R3FOXQSM5T2ZADC -p tcp -m tcp -j DNAT --to-destination 10.82.0.10:443
-A KUBE-SVC-JLRSZDR3OXJ4SUA2 -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-4R3FOXQSM5T2ZADC
cloudiq@hubandspoke:~$ cat ciq-dev-aks-iptables-save.output | grep KUBE-SEP-PI7R3ONIYH4XJLMW
:KUBE-SEP-PI7R3ONIYH4XJLMW - [0:0]
-A KUBE-SEP-PI7R3ONIYH4XJLMW -s 10.82.0.82/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-PI7R3ONIYH4XJLMW -p tcp -m tcp -j DNAT --to-destination 10.82.0.82:443
-A KUBE-SVC-JLRSZDR3OXJ4SUA2 -j KUBE-SEP-PI7R3ONIYH4XJLMW

Automation Testing helps complete the entire software testing life cycle (STLC) in less time and improve efficiency of the testing process.

Test Automation enables teams to verify functionality, test for regression and run simultaneous tests efficiently. In this article we will take a detailed look at the Automation Testing Tools available, standards and best practices to be followed during Test Automation.

Following the best practices for Software Testing Life Cycle (Unit testing, Integration Testing & System Testing) ensures that the client gets the software as intended without any bugs. End-to-end testing is the methodology used to test whether the flow of an application is performing as designed from start to finish. Carrying out end-to-end tests helps identify system dependencies and ensure the flow of right information across various system components and the system.

Ultimately Automation Testing increases the speed of test execution and the test coverage.

When to Choose Automation Testing
  • There is lots of regression work
  • GUI is same, but you have lot of often functional changes
  • Requirements do not change frequently
  • Load and performance testing with many virtual users
  • Repetitive test cases that tend well to automation & saves time
  • Huge projects
  • Projects that need to test the same areas

Steps to Implement Automation Testing
  • Identify areas within software to automate
  • Choose the appropriate tool for test automation
  • Write test scripts
  • Develop test suits
  • Execute test scripts
  • Build result reports
  • Find possible bugs or performance issues
Choosing your Automation Testing Tool

The strategy to adopt test automation should clearly define when to opt for automation, its scope and selection of the right kind of tools for execution. And when it comes to tools the top ones to go for are

  • Cypress
  • Selenium
  • Protractor
  • Appium(Mobile)
Why Cypress?

Cypress is a JavaScript based testing framework built for the modern web. Cypress helps to create End-to-end tests, Integration tests and Unit tests. Cypress takes a different approach compared to other testing frameworks, since it’s executed in the same run loop as the application. It also leverages a Node.js server to handle any task that needs to happen outside of the browser. With its ability to understand everything happening inside and outside of the browser, it produces more consistent results.

Key Features of Cypress
  • Automatic Waiting – No need for adding wait and sleep.
  • Spies, Stubs, and Clocks – Verify and control the behaviour of functions, server responses, or timers.
  • Network traffic control and monitoring – Easily control, stub, and test edge cases without involving your server. You can stub network traffic however you like.
  • Consistent Results – Cypress architecture doesn’t use Selenium or WebDriver. It is fast, consistent and does reliable tests that are flake-free.
  • Screenshots and Videos – View screenshots taken automatically on failure, or videos of your entire test suite when run from the CLI.
Azure CICD Setup with Cypress

Cypress runs on most of the following CI providers.

Azure DevOps / VSTS CI / TeamFoundation
BitBucket
CircleCI
Docker
GitLab
Jenkins
TravisCI

Azure DevOps – Steps to Integrate Cypress Automation Tests
  • Pre-Build Testing
  • Install the Node module and run application in test mode
  • Run the tests
  • Publish the test results
  • Cypress Containerization
  • Build the docker container of cypress
  • Push the image to container
  • Publish the Build

Before we get started here are the basic Cypress installation commands

Clean up the old results
$ rm -rf cypress/reports/
 
Run the cypress application with required spec file.
$ cypress run –spec \”cypress/integration/**/*.spec.ts\” // mention your spec file
 
Configure the mocha reports path for publishing test results.
–reporter junit –reporter-options ‘mochaFile=cypress/reports/test-output-[hash].xml,toConsole=true’
 
Uninstall the application.
$ npm uninstall cypress-multi-reporters; npm uninstall cypress-promise; npm uninstall cypres

Pre-Build Testing

It is critical to test the application before the Build, Deployment or Release. Essentially the process involves regression and smoke testing. And don’t forget the sanity checks before the build is deployed in the staging environment.

Cypress comes in handy for testing angular / JavaScript applications before they are deployed to staging or production environment.

Install the Node module and run application in test mode

Install the required node module of the application then run the application with test mode.

$ npm install –save-dev start-server-and-test

$ start-server-and-test start http://localhost:4200

Publish the test results

The results of the Cypress test execution are stored in specified path and are added to the Azure DevOps test results. Cypress supports JUnit, Mocha, Mochawsome test results reporter formats and provides options to create customised test results and merge all the test results as well.

Cypress Containerization

Cypress supports docker containerization and that makes it easy to set it up in a cluster environment like AKS. The Cypress base images are available at the link below.

https://github.com/cypress-io/cypress-docker-images

Copy the package.json and UI source code to the app folder and run the Cypress test. The following commands are used to run the docker and execute.

  script: |
        docker run -d -it --name cypressName:cypressImageTag bash
        docker commit -p cypressName:cypressImageTag
        docker stop cypressName
        docker rm -f cypressName
    
    - script: docker tag cypressName:cypressImageTag
      displayName: Tag Cypress image 
      
    
    - task: Docker@1
    displayName: Push image To Registry
    inputs:
        command: push
        azureSubscriptionEndpoint: azureSubscriptionEndpoint
        azureContainerRegistry: $(azureContainerRegistry)
        imageName: acrImageName:BuildId
 
    - script: sudo rm -rf /test-results/*
    displayName: Removing Previous Results
 
    - task: ShellScript@2
    displayName: 'Bash Script - cypress base image post-deployment'
    inputs:
        scriptPath: ./cypress-deployment.sh
        args: $(azureRegistry) $(cypressImageName) $(azureContainerValue) $(CYPRESS_OPTIONS) 
        continueOnError: true
    - task: PublishTestResults@1
    displayName: 'Publish Test Results ./test-results-*.xml'
    inputs:
 
    cypress-base-image-post-deployment.sh
 
    docker run -v $systemSourceDirectory:/app/cypress/reports --name vca-arp-ui 
    $cypress_Latestimage npx cypress run $cypressOptions bash

Now the container should be set up on on your local machine and start running your specs.

Cypress is simple and easily integrates with your CI environment. Apart from the browser support, Cypress reduces the efforts of manual testing and is relatively faster when compared to other automation testing tools.

In this article we will discuss how to create security groups in AWS for Kubernetes. The goal is to set up a Kubernetes cluster on AWS EC2, having provisioned your virtual machines. You are going to need two security groups: one for the control plane load balancer, and another for the VMs.

Creating a Security Group through the AWS Console

Prerequisite: You should have a VPC (virtual private cloud) set up.

Log into the AWS EC2 (or VPC) console. On the left-hand menu, under Network and Security, click Security Groups.

Click on Create Security Group.

Enter a Name and a Description for your Security Group. Then select your VPC from the drop-down menu. Click Add Rule.

You will need 2 TCP ingress rules, one over port 6443, another over port 443. We are choosing to allow the Source from anywhere. In production you may want to restrict the CIDR, IP, or security group that can reach this load balancer.

We are choosing to leave the outbound rules as default, in which all outbound traffic is permitted.

Click Create and your security group is created!

Select your security group in the console. You may want to give your security group a Name (in addition to the Group Name that you specified when creating it).

But you are not done yet: you must add tags to your security group. These tags will alert AWS that this security group is to be used for Kubernetes. Click on the Tags tab at the bottom of the window. Then click Add/Edit Tags.

You will need 2 tags:
  • Name: KubernetesCluster. Value: <the name of your Kubernetes cluster>
  • Name: kubernetes.io/cluster/<the name of your Kubernetes cluster>. Value: owned

Click Save and your tags are saved!

Creating a Security Group for the Virtual Machines

Follow the steps above to create a security group for your virtual machines. Here are the ports that you will need to open for your control plane VMs:

The master node:
  1. 22 for SSH from your bastion host
  2. 6443 for the Kubernetes API Server
  3. 2379-2380 for the ETCD server
  4. 10250 for the Kubelet health check
  5. 10252 for the Kube controller manager
  6. 10255 for the read only kubelet API
The worker nodes:
  1. 22 for SSH
  2. 10250 for the kubelet health check
  3. 30000-32767 for external applications. However, it is more likely that you will expose external applications to outside the cluster via load balancers, and restrict access to these ports to within your vpc.
  4. 10255 for the read only kubelet API

We have chosen to combine the master and the worker rules into one security group for convenience. You may want to separate them into 2 security groups for extra security.

Follow the step-by-step instructions detailed above and you will have successfully created AWS Security Groups for Kubernetes.

What is Synthetic New Relic?:

New Relic Synthetics is a set of automated scriptable tools to monitor the websites, critical business transactions and API endpoints. A detailed individual results from each monitor run can also be viewed. With access to New relic Insights, in-depth queries of data can be run from Synthetics monitors. Creation of custom dashboards are also possible.

Features of Synthetic New Relic:
  • Easy to set up real time instrumentation and analytics
  • REST API functions
  • Real browsers
  • Comparative charting with Browser
  • New Relic Insights support
  • Advanced scripted monitoring
  • Global test coverage
Different types of Synthetic Monitor:

There are four types of monitor.

a) Ping monitor:

Ping monitors are the simplest type of monitor. These monitors are used to check if an application is online. The Synthetics ping monitor uses a simple Java HTTP client to make requests to your site.

b) API tests:

API tests are used to monitor API endpoints. This can ensure that the app server works in addition to the corresponding website. New Relic uses the “http request module” internally to make HTTP calls to API endpoint and validate the results.

c) Browser:

Simple browser monitors essentially are simple, pre-built scripted browser monitors. These monitors make a request to the site using an instance of Google Chrome.

d) Script_Browser:

Scripted browser monitors are used for more sophisticated, customized monitoring. A custom script can be created to navigate to the website, take specific actions and ensure that the specific resources are present.

Creation of Synthetic Monitor:

API Test Monitor:

Step 1:

  • Login to new relic monitor

Step 2 – Create synthetic monitor

  • Click “synthetic” in new relic dashboard after click on the “Add new” in the right up corner.

Step 3: Enter the Required Details

  • Select on “API Test” in monitor type.
  • Enter the monitor name under details
  • Select one location for the monitor under monitoring locations.
  • Set the Schedule – Set frequency for monitoring. For example On selecting frequency as 10 mins, The monitor would run this monitor and check for every 10 mins.
  • Set Notification – Notification to email ids can be set with help of new alert policy or can be appended to existing alert policy. In case of existing alert policy, Click on “Add to an existing alert policy” and the existing policy can be selected. In case of new policy, email address and policy name has to be given. There are three type of policy,
    1. By Policy – Only one open incident at a time for this alert policy.
    2. By Condition – Only one open incident at a time per alert condition
    3. By condition and entity – open an incident every time a condition is violated.
  • Only on completing the above steps, Script can be written by clicking on “Write Your script”
  • Click on “create monitor” after the monitor creation steps done.
PING Monitor:

Step 1:

  • Login to new relic monitor

Step 2 – Create synthetic monitor

  • Click “synthetic” in new relic dashboard after click on the “Add new” in the right up corner.

Step 3: Enter the Required Details

  • Select on “API Test” in monitor type
  • Enter the monitor name under details
  • Enter the URL and enter the response corresponding URL
  • Select one location for the monitor under monitoring locations.
  • Set the Schedule – Set frequency for monitoring. For example On selecting frequency as 10 mins, The monitor would run this monitor and check for every 10 mins.
  • Set Notification – Notification to email ids can be set with help of new alert policy or can be appended to existing alert policy. In case of existing alert policy, Click on “Add to an existing alert policy” and the existing policy can be selected. In case of new policy, email address and policy name has to be given. There are three type of policy,
    1. By Policy – Only one open incident at a time for this alert policy.
    2. By Condition – Only one open incident at a time per alert condition.
    3. By condition and entity – open an incident every time a condition is violated.
  • Only on completing the above steps, Ping monitor gets created when clicking on “ Create Monitor”

Synthetic Monitor Functionality:

API Test:
Pass Scenario:

Below script is used to store the data using Post method, then pass the value to the call back function .Call back function is nothing but it is a function is passed into another function as an argument.

Here, call back function has three arguments like error, response and body.

In the below script, comparing the value “gear” and “10” with JSON body value. Both the values are same. Hence no assertion error is triggered.

In case of value mismatch, an assertion error is thrown.

Failure scenario:

In the below script, the values do not match with the JSON body value. Hence an assertion error is thrown.

In case of assertion error, an alert will be sent to the mail id given in the notification channel. The Assertion error will not be resolved until the Value is made “10”.

Mail Alert: (Ping & API Test)

The error log can be seen as below:

After the error is fixed, an update would be sent to the notification channel

Delete a Monitor: (Ping & API Test)
  • From the Monitors list, select the monitor which needs to delete.
  • In the selected monitor, under settings click on General to view the monitor settings page.
  • Select the trash icon, it will show alert popup and click on “ok” in alert popup then monitor will delete.

Introduction:

Cloud Foundry has a Container-based architecture, open source cloud application platform. It provides the cloud instances and mainly used to deploy the Application directly into cloud environment. Instead of running the app separately, using the CF CLI(Command Line Interface) tool to deploy , test, configure and manage the apps on CF.

Features of Cloud Foundry:
  • An open source Cloud Native Platform
  • Fast and easy to build, test, deploy ,manage& scale apps
  • Works with any language or framework
  • Highly adaptable
  • Can able to see running status of apps
  • Can scale up or down, debug apps on CF
How to interact with CF?
  • Command Line Interface (CLI): from terminal / command prompt
  • IDE plugins
Org and App Space Roles:

CF uses role-based access control, with each role granting permissions in either an organization or an application space.

Organisation :
  • An Organisation or org represents an organisational account and groups together users, resources, applications, and environments.
  • Each organisation has a resource quota and it shares the same resource and domain.
  • Organisations segregate tenants in a Cloud Foundry installation.

To List all orgs that the user has access to the below command can be given in the terminal.

cf orgs
Space:
  • An organisation have separate spaces for development, staging and production versions of the apps.
  • A space can also have its own quota.
  • It has the shared location for developing and running apps
  • Every application and service is scoped to a space

To List all spaces in the current org

cf spaces
Relationship between org, space and Apps:
 
org
space
APP
APP
Service
Instance
Service
Instance
space
APP
APP
Service
Instance
Service
Instance
Before pushing the app into Cloud Foundry, Ensure that:
  • Log into cloud foundary using cf login command
    • cf login -a API-URL
  • It will prompt for username and password, then give the correct credentials
  • Select the org and space where the app gets push.
  • Then push the application using cf push
How to deploy an app into cf?

To deploy an application, need to push its code to the Cloud Foundry instance. The push command is used to push the application on cloud foundary. The arguments may be vary depends on application types. However, it is the best practice to specify all the arguments in a system file called manifest.yml

It provides consistency and reproducibility.An app can specify its service instance dependencies in the manifest.yml file. It will automatically bind to the service instances.

  1. # Start a new app called “myapp”
  2. # If there’s a manifest.yml in the current folder,
  3. # the config will be read from there
  4. cf push
Manifest Format

Manifests has written in YAML. The below manifest illustrates some YAML conventions, as follows:

  • The manifest file begins with three dashes.
  • The applications block begins with a heading followed by a colon.
  • The app name is preceded by a single dash and one space.
  • Subsequent lines in the block are indented two spaces to align with name
Sample manifest.yml

applications:

– name: my-app

memory: 512M

instances: 2

buildpack:nodejs_buildpack

Buildpack:
  • A Cloud Foundry component that resolves app’s runtime dependencies
  • It provides framework and run time support for applications.
  • It is used to determine what dependencies to download
  • It is used to tell how to configure applications to communicate with different services.
  • It is used to compile or prepare the application for launch.
What happens when push an app using cf push?
  • Upload: App files sent to CF
  • Staging:Executable artifact is created (droplet)
  • Running:App starts on an app host

App receives web requests (if it binds to TCP port)

List of cf commands:
cf commandsPurpose
cf targetSets or views the targeted organization or space
cf stopStops an application
cf startStart an app
cf set-envSets an environment variable for an application(cf set-env var_name var_value)
cf servicesLists all of the services that are available in the current space
cf restartStop all instances of the app, then start them again. This causes downtime.
cf restageRecreate the app’s executable artifact using the latest pushed app files and the latest environment (variables, service bindings, buildpack, stack, etc.). This action will cause app downtime.
cf renameRename an app
cf pushDeploys a new application(cf push )
cf marketplaceLists all of the services that are available in the marketplace.
cf logsDisplays the STDOUT and STDERR log streams of an application.(cf logs
cf login -a Log in to CF
cf helpshow help
cf eventsDisplays runtime events that are related to an application.(cf events )
cf deleteDeletes an existing application.(cf delete
cf create-spaceCreates a space.(cf create-space
cf bind-serviceBinds an existing service instance to your application.
cf appsLists all of the applications that you deployed in the current space. The status of each application is also displayed.
cf apiTo view the current API endpoint
cf -vDisplays the version of the Cloud Foundry command line interface.

What is Cross Browser Testing?

Cross Browser Testing is a type of Functional Test to check whether web application works as expected on different browsers.

(Or)

Cross-browser testing is basically running the same set of test cases multiple times on different browsers.

Below two are the most intent of cross-browser testing,

Below two are the most intent of cross-browser testing,
  1. Below two are the most intent of cross-browser testing,
  2. Appearance of the page in different browsers- is it the same, is it different, if one is better than the other, etc

Note: In recent years, testing mobile browsers are included on the Cross-Browser testing scope.

When this testing can be started?

Any testing reaps the best benefits when it is done early on. Therefore, the industry recommendation is to start with it as soon as the page designs are available. Because finding and fixing bugs on early stages are very cost effective. Finding bugs after release or completion of application will not be a cost effective one.

Cross Browser testing through Manual:

Sure, it can be done manually. First, business needs to identify all browsers that the application needs to support. Tester need to run all the testcase against every identified browser and observe whether the appearance and functionality are same.

Through manual testing, it is not possible to cover many browsers and its major versions. So, performing cross browser testing manually will be costly and time-consuming too.

In an Agile world it’s not a good advice to do whole cross browser testing through manual.

Cross Browser testing through Automation:

As stated above, Cross-browser testing is basically running the same set of test cases multiple times on different browsers. This type of repeated task is best suited for automation. Thus, it’s more cost and time effective to perform this testing by using tools.

Selenium for Cross Browser Testing:

Selenium is well known for automated testing of the web-based applications. Just by changing the browser to be used for running the test cases, selenium makes it very easy to run the same test cases multiple times using different browsers.

Note: Rest of this blog we are going to see how Selenium can be used for Cross-Browser Testing.

Advantages of choosing Selenium:
  • Open source
  • Supports programming languages like Java, Perl, Python, C#, Ruby, Groovy, Java Script, etc
  • Platform Independent: Supports (OS) like Windows, Mac, Linux, UNIX, etc.
  • Supports multiple browsers namely, Internet Explorer, Chrome, Firefox, Opera, Safari, etc
  • Ease of implementation
  • Reusability

By using TestNG along with Selenium Grid we can achieve parallel test execution on different browser in different machines. Let’s see TestNG and Selenium Grid on the following topics,

TestNG:

TestNG is an automation testing framework in which NG stands for “Next Generation”. TestNG is inspired from JUnit which uses the annotations (@). Default Selenium tests do not generate a proper format for the test results. Using TestNG we can generate test results.

Why TestNG?
  • Multiple test cases can be grouped easily by converting them into testng.xml file. In which you can make priorities which test case should be executed first.
  • The same test case can be executed multiple times without loops just by using keyword called ‘invocation count.’
  • Using TestNG, you can execute multiple test cases on multiple browsers
  • It can be easily integrated with tools like Maven, Jenkins, etc.
Selenium Grid

Selenium Grid is a part of the Selenium Suite which specialise in running multiple tests across different browsers, operating system and machines. You can connect to it with Selenium Remote by specifying the browser, browser version, and operating system you want

Components of Selenium Grid
Hub:

In Selenium Grid, the HUB is a computer which is the central point where we can load our tests into. Hub also acts as a server because of which it acts as a central point to control the network of Test machines. The Selenium Grid has only one hub and it is the master of the network.

Nodes

In Selenium Grid, a NODE is referred to a Test Machine which opts to connect with the Hub. This test machine will be used by Hub to run tests on. A Grid network can have multiple nodes. A node is supposed to have different platforms i.e. different operating system and browsers. The node does not need the same platform for running as that of hub.

Advantages of Selenium Grid
  • Selenium Grid allows running multiple tests across different web browsers, operating systems, and machines. This ensures compatibility of the application under test across multiple combinations of web browsers, operating system, and hardware architecture
  • It speeds up the test suite completion time as it can run multiple tests in parallel. For example, if we have 10 nodes and we need to execute a test suite of 50 tests then it is going to take 10 times lesser time than a single machine that runs this test suit without Selenium Grid.
Disadvantage of Selenium Grid
  • Extra cost to project as it requires additional machines as Nodes
Grid Code Snippets:

What is Jenkins?

Jenkins is an open source automation tool written in Java with plugins built for Continuous Integration purpose. Jenkins is used to build and test your software projects continuously making it easier for developers to integrate changes to the project, and making it easier for users to obtain a fresh build. It also allows you to continuously deliver your software by integrating with a large number of testing and deployment technologies.

With Jenkins, organizations can accelerate the software development process through automation. Jenkins integrates development life-cycle processes of all kinds, including build, document, test, package, stage, deploy, static analysis and much more.

Jenkins achieves Continuous Integration with the help of plugins. Plugins allows the integration of Various DevOps stages. If you want to integrate a particular tool, you need to install the plugins for that tool. For example: Git, Maven 2 project, Amazon EC2, HTML publisher etc.

Advantages of Jenkins include:

  • It is an open source tool with great community support.
  • It is easy to install.
  • It has 1000+ plugins to ease your work. If a plugin does not exist, you can code it and share with the community.
  • It is free of cost.
  • It is built with Java and hence, it is portable to all the major platforms
What is Continuous Integration?

Continuous Integration is a development practice in which the developers are required to commit changes to the source code in a shared repository several times a day or more frequently. Every commit made in the repository is then built. This allows the teams to detect the problems early. Apart from this, depending on the Continuous Integration tool, there are several other functions like deploying the build application on the test server, providing the concerned teams with the build and test results etc.

Continuous Integration with Jenkins
  • First, a developer commits the code to the source code repository. Meanwhile, the Jenkins server checks the repository at regular intervals for changes.
  • Soon after a commit occurs, the Jenkins server detects the changes that have occurred in the source code repository. Jenkins will pull those changes and will start preparing a new build.
  • If the build fails, then the concerned team will be notified.
  • If built is successful, then Jenkins deploys the built in the test server.
  • After testing, Jenkins generates a feedback and then notifies the developers about the build and test results.
  • It will continue to check the source code repository for changes made in the source code and the whole process keeps on repeating.
Jenkins Distributed Architecture

Jenkins uses a Master-Slave architecture to manage distributed builds. In this architecture, Master and Slave communicate through TCP/IP protocol.

Jenkins Master

Your main Jenkins server is the Master. The Master’s job is to handle:

  • Scheduling build jobs.
  • Dispatching builds to the slaves for the actual execution.
  • Monitor the slaves (possibly taking them online and offline as required).
  • Recording and presenting the build results.
  • A Master instance of Jenkins can also execute build jobs directly.
Jenkins Slave

A Slave is a Java executable that runs on a remote machine. Following are the characteristics of Jenkins Slaves:

  • It hears requests from the Jenkins Master instance.
  • Slaves can run on a variety of operating systems.
  • The job of a Slave is to do as they are told to, which involves executing build jobs dispatched by the Master.
  • You can configure a project to always run on a particular Slave machine, or a particular type of Slave machine, or simply let Jenkins pick the next available Slave.
What is a Jenkins pipeline?

A pipeline is a collection of jobs that brings the software from version control into the hands of the end users by using automation tools. It is a feature used to incorporate continuous delivery in our software development workflow.

Over the years, there have been multiple Jenkins pipeline releases including, Jenkins Build flow, Jenkins Build Pipeline plugin, Jenkins Workflow, etc. What are the key features of these plugins?

  • They represent multiple Jenkins jobs as one whole workflow in the form of a pipeline.
  • What do these pipelines do? These pipelines are a collection of Jenkins jobs which trigger each other in a specified sequence.

Lets look at an example. Suppose I’m developing a small application on Jenkins and I want to build, test and deploy it. To do this, I will allot 3 jobs to perform each process. So, job1 would be for build, job2 would perform tests and job3 for deployment. I can use the Jenkins build pipeline plugin to perform this task. After creating three jobs and chaining them in a sequence, the build plugin will run these jobs as a pipeline.

This approach is effective for deploying small applications. But what happens when there are complex pipelines with several processes (build, test, unit test, integration test, pre-deploy, deploy, monitor) running 100’s of jobs?

The maintenance cost for such a complex pipeline is huge and increases with the number of processes. It also becomes tedious to build and manage such a vast number of jobs. To overcome this issue, a new feature called Jenkins Pipeline Project was introduced.

The key feature of this pipeline is to define the entire deployment flow through code. What does this mean? It means that all the standard jobs defined by Jenkins are manually written as one whole script and they can be stored in a version control system. It basically follows the ‘pipeline as code’ discipline. Instead of building several jobs for each phase, you can now code the entire workflow and put it in a Jenkinsfile. Below is a list of reasons why you should use the Jenkins Pipeline.

Jenkins Pipeline Advantages
  • It models simple to complex pipelines as code by using Groovy DSL (Domain Specific Language)
  • The code is stored in a text file called the Jenkinsfile which can be checked into a SCM (Source Code Management)
  • Improves user interface by incorporating user input within the pipeline
  • It is durable in terms of unplanned restart of the Jenkins master
  • It can restart from saved checkpoints
  • It supports complex pipelines by incorporating conditional loops, fork or join operations and allowing tasks to be performed in parallel
  • It can integrate with several other plugins
What is a Jenkinsfile?

A Jenkinsfile is a text file that stores the entire workflow as code and it can be checked into a SCM on your local system. How is this advantageous? This enables the developers to access, edit and check the code at all times.

The Jenkinsfile is written using the Groovy DSL and it can be created through a text/groovy editor or through the configuration page on the Jenkins instance. It is written based on two syntaxes, namely:

  • Declarative pipeline syntax
  • Scripted pipeline syntax

Declarative pipeline is a relatively new feature that supports the pipeline as code concept. It makes the pipeline code easier to read and write. This code is written in a Jenkinsfile which can be checked into a source control management system such as Git.

Whereas, the scripted pipeline is a traditional way of writing the code. In this pipeline, the Jenkinsfile is written on the Jenkins UI instance. Though both these pipelines are based on the groovy DSL, the scripted pipeline uses stricter groovy based syntaxes because it was the first pipeline to be built on the groovy foundation. Since this Groovy script was not typically desirable to all the users, the declarative pipeline was introduced to offer a simpler and more optioned Groovy syntax.

The declarative pipeline is defined within a block labelled ‘pipeline’ whereas the scripted pipeline is defined within a ‘node’

An example Jenkinsfile looks like this:

pipeline {
environment {
BUILD_SCRIPTS_GIT="http://10.100.100.10:7990/scm/~myname/mypipeline.git"
BUILD_SCRIPTS='mypipeline'
BUILD_HOME='/var/lib/jenkins/workspace'
 }
agent any
stages {
stage('Checkout: Code') {
steps {
sh "mkdir -p $WORKSPACE/repo;\
git config --global user.email '[email protected]';\
git config --global user.name 'myname';\
git config --global push.default simple;\
git clone $BUILD_SCRIPTS_GIT repo/$BUILD_SCRIPTS"
sh "chmod -R +x $WORKSPACE/repo/$BUILD_SCRIPTS"
  }
 }
stage('Yum: Updates') {
steps {
sh "sudo chmod +x $WORKSPACE/repo/$BUILD_SCRIPTS/scripts/update.sh"
sh "sudo $WORKSPACE/repo/$BUILD_SCRIPTS/scripts/update.sh"
   }
  }
 }
post {
always {
cleanWs()
  }
 }
}

The above Jenkins file does the following.

  • sets up environment variables
  • pulls data down from a git repo
  • sets it up in a Jenkins workspace
  • runs a script under scripts/
  • once completes by cleaning up the workspace (successful or not)
Pipeline concepts
  • Pipeline

This is a user defined block which contains all the processes such as build, test, deploy, etc. It is a collection of all the stages in a Jenkinsfile. All the stages and steps are defined within this block. It is the key block for a declarative pipeline syntax.

  • Node

A node is a machine that executes an entire workflow. It is a key part of the scripted pipeline syntax.

There are various mandatory sections which are common to both the declarative and scripted pipelines, such as stages, agent and steps that must be defined within the pipeline. These are explained below:

  • Agent

An agent is a directive that can run multiple builds with only one instance of Jenkins. This feature helps to distribute the workload to different agents and execute several projects within a single Jenkins instance. It instructs Jenkins to allocate an executor for the builds.

A single agent can be specified for an entire pipeline or specific agents can be allotted to execute each stage within a pipeline. Few of the parameters used with agents are:

  • Any

Runs the pipeline/ stage on any available agent.

  • None

This parameter is applied at the root of the pipeline and it indicates that there is no global agent for the entire pipeline and each stage must specify its own agent.

  • Label

Executes the pipeline/stage on the labelled agent.

  • Docker

This parameter uses docker container as an execution environment for the pipeline or a specific stage. In the below example I’m using docker to pull an ubuntu image. This image can now be used as an execution environment to run multiple commands.

  • Stages

This block contains all the work that needs to be carried out. The work is specified in the form of stages. There can be more than one stage within this directive. Each stage performs a specific task. In the following example, I’ve created multiple stages, each performing a specific task.

  • Steps

A series of steps can be defined within a stage block. These steps are carried out in sequence to execute a stage. There must be at least one step within a steps directive. In the following example I’ve implemented an echo command within the build stage. This command is executed as a part of the ‘Build’ stage.

Continuous Integration (CI) is a development practice where developers integrate code into a shared repository frequently, preferably several times a day. Each integration can then be verified by an automated build and automated tests. While automated testing is not strictly part of CI it is typically implied.

One of the key benefits of integrating regularly is that you can detect errors quickly and locate them more easily. As each change introduced is typically small, pinpointing the specific change that introduced a defect can be done quickly.

In recent years CI has become a best practice for software development and is guided by a set of key principles. Among them are revision control, build automation and automated testing.

Benefits and Advantages of Continuous Integration

Continuous Integration has many benefits. A good CI setup speeds up your workflow and encourages the team to push every change without being afraid of breaking anything. There are more benefits to it than just working with a better software release process. Continuous Integration brings great business benefits as well.

  • Reduces the time and effort for integrations of different code changes
  • Enables a quick feedback mechanism on every change
  • Allows earlier detection and prevention of defects
  • Helps collaboration between team members so recent code is always shared
  • Reduces manual testing effort
  • Building features more incrementally saves time on the debugging side so you can focus on adding features
  • First step into fully automating the whole release process
  • Prevents divergence in different branches as they are integrated regularly
Continuous Integration Tools

Jenkins

Jenkins is a cross-platform open source CI tool written in Java. It offers configuration through both the GUI interface and the console commands. Jenkins is a very flexible tool to use because it offers an extension of features through plugins. Its plugin list is very broad, and one can easily add their own plugins to that list. Furthermore, Jenkins can distribute software builds and test loads on several machines.

Travis CI

Travis CI is an open source CI service free for all open source projects hosted on GitHub. Since Travis CI is hosted, it is platform independent. It is configured using Travis.Yml files which contain actionable data. Travis CI supports a variety of software languages, and the build configuration for each of those languages is complete. Travis CI uses virtual machines to create applications.

TeamCity

TeamCity is a Java-based sophisticated CI tool offered by JetBrains. It supports Java,Net and Ruby platforms. TeamCity has a range of free plugins available developed both by JetBrains and third parties. It also offers integration with several IDEs including, Eclipse, IntelliJ IDEA and Visual Studio. Moreover, TeamCity allows simultaneous running of multiple builds and tests in different platforms and environments.

GitLab CI

GitLab CI is hosted on the free hosting service GitLab.com, and it offers Git repository management function with features such as, access control, bug tracking, and code reviewing. GitLab CI is completely unified with GitLab and it can easily be used to link projects using the GitLab API. GitLab CI process builds are coded in the Go language and can execute on several operating systems such as, Windows, Linux, Docker, OSX, and FreeBSD.

CircleCI

CircleCI is a CI tool hosted only on GitHub. It supports several languages, including Java, Python, Ruby/Rails, Node.js, PHP, Skala and Haskell. It offers services based on containers. CircleCI offers one container free, and any number of projects can be built on it. It offers up to five levels of parallelization (1x, 4x, 8x, 12x and 16x). Therefore, maximum parallelization of 16x can be achieved in one build. CircleCI also supports Docker platform.

Bamboo

Bamboo is a CI tool developed by Atlassian. Bamboo is available in two versions, cloud and server. For the cloud version, Atlassian offers hosting service with the help of Amazon EC2 account. For the server version, self-hosting needs to be done. Bamboo supports well known Atlassian products, JIRA and BitBucket.

Machine Learning

Artificial Intelligence

Artificial intelligence (AI) is the simulation of human intelligence processes by machines, especially computer systems. These processes include learning (the acquisition of information and rules for using the information), reasoning (using rules to reach approximate or definite conclusions) and self-correction.

Machine Learning

Machine learning is an application of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed. Machine learning focuses on the development of computer programs that can access data and use it to learn for themselves.

In Traditional Programming, data and program are run on the computer to produce the output. In Machine Learning, data and output are run on the computer to create a program. The program can be used in traditional programming.

Machine learning algorithms are often categorized as supervised or unsupervised.

Supervised Learning

Supervised learning is a learning in which we teach or train the machine using data which is well labelled that means some data is already tagged with correct answer. After that, machine is provided with new set of examples(data) so that supervised learning algorithm analyses the training data (set of training examples) and produces a correct outcome from labelled data.

Classification algorithms and regression algorithms are types of supervised learning. Classification algorithms are used when the outputs are restricted to a limited set of values. For a classification algorithm that filters emails, the input would be an incoming email, and the output would be the name of the folder in which to file the email. For an algorithm that identifies spam emails, the output would be the prediction of either “spam” or “not spam”, represented by the Boolean values true and false. Regression algorithms are named for their continuous outputs, meaning they may have any value within a range. Examples of a continuous value are the temperature, length, or price of an object.

Unsupervised Learning

Unsupervised learning is the training of machine using information that is neither classified nor labelled and allowing the algorithm to act on that information without guidance. Here the task of machine is to group unsorted information according to similarities, patterns and differences without any prior training of data. The most common unsupervised learning method is cluster analysis or clustering, which is used for exploratory data analysis to find hidden patterns or grouping in data.

Some simple Machine Learning algorithms

Linear Regression

Here, we establish a relationship between independent and dependent variables by fitting the best line. It is used to estimate real values (cost of houses, number of calls, total sales, etc.) based on a continuous variable(s).

Below model is used to predict the Ice cream sales based on the temperature in a city.

We need a weight(w) and a bias(b) to fit a straight-line (y = wx + b) and this can be diagrammatically represented as given below:

Above diagram is the simplest Neural Network. A neural network is a system of hardware and/or software patterned after the operation of neurons in the human brain.

Logistic Regression

Logistic Regression is a classification algorithm used to estimate discrete binary values (like 0/1, yes/no, true/false) based on given set of independent variables. Typically, this involves fitting a curve to separate 2 distinct classes of data points.

The neural network for logistic regression has multiple weights / bias as inputs and 2 output nodes as shown below:

Deep Learning

Deep learning is a specific method of machine learning, and it’s based primarily on the use of neural networks.

In traditional supervised machine learning, systems require an expert to use his or her domain knowledge to specify the information (called features) in the input data that will best lead to a well-trained system. In Deep Learning, rather than specifying the features in our data that we think will lead to the best classification accuracy, we let the machine find this information on its own. Often, it can look at the problem in a way that even an expert wouldn’t have been able to imagine.

Neural Network Terminology

Activation function

The activation function of a node defines the output of that node, or “neuron”, given an input or set of inputs. This output is then used as input for the next node and so on until a desired solution to the original problem is found. Some of the commonly used activation functions are given below

Input / Output / Hidden Layers

Simply as the name suggests the input layer is the one which receives the input and is essentially the first layer of the network. The output layer is the one which generates the output or is the final layer of the network. The processing layers are the hidden layers within the network. These hidden layers are the ones which perform specific tasks on the incoming data and pass on the output generated by them to the next layer. The input and output layers are the ones visible to us, while are the intermediate layers are hidden.

Forward propagation

Forward Propagation refers to the movement of the input through the hidden layers to the output layers. In forward propagation, the information travels in a single direction FORWARD. The input layer supplies the input to the hidden layers and then the output is generated. There is no backward movement.

Cost / Loss function

When we build a network, the network tries to predict the output as close as possible to the actual value. We measure this accuracy of the network using the loss function. The loss function tries to penalize the network when it makes errors. Our objective while running the network is to increase our prediction accuracy and to reduce the error, hence minimizing the loss function. The most optimized output is the one with the least value of the loss function. If we define the loss function to be the mean squared error, it can be written as –

C= 1/m ∑ (y – a)2 where m is the number of training inputs, a is the predicted value and y is the actual value of that example.

The learning process revolves around minimizing the cost.

Gradient Descent

Gradient descent is an optimization algorithm for minimizing the cost. To think of it intuitively, while climbing down a hill you should take small steps and walk down instead of just jumping down at once. Therefore, what we do is, if we start from a point x, we move down a little i.e. delta h, and update our position to x-delta h and we keep doing the same till we reach the bottom. Consider bottom to be the minimum cost point.

Mathematically, to find the local minimum of a function one takes steps proportional to the negative of the gradient of the function.

Learning Rate

rate at which we descend towards the minima of the cost function is the learning rate. We should choose the learning rate very carefully since it should neither be very large that the optimal solution is missed and nor should be very low that it takes forever for the network to converge.

Backpropagation

When we define a neural network, we assign random weights and bias values to our nodes. Once we have received the output for a single iteration, we can calculate the error of the network. This error is then fed back to the network along with the gradient of the cost function to update the weights of the network. These weights are then updated so that the errors in the subsequent iterations is reduced. This updating of weights using the gradient of the cost function is known as back-propagation.

Steps in training a Neural Network
  • Initialize weights and biases.
  • ii. Forward propagation: Using the input X, weights W and biases b, for every layer we compute Z and A, the Linear and Non-linear activations. At the final layer, we compute f(A^(L-1)) which could be a sigmoid, softmax or linear function of A^(L-1) and this gives the prediction y_hat.
  • Compute the loss function: This is a function of the actual label y and predicted label y_hat. It captures how far off our predictions are from the actual target. Our objective is to minimize this loss function.
  • Backward Propagation: In this step, we calculate the gradients of the loss function f(y, y_hat) with respect to A, W, and b called dA, dW and db. Using these gradients, we update the values of the parameters from the last layer to the first.
  • Repeat steps 2–4 for n iterations/epochs till we feel we have minimized the loss function, without overfitting the train data
Machine Learning using Python

Simple Machine Learning models like Linear Regression can be trained using the python library scikit-learn. Neural Networks are built and trained using the libraries Keras, TensorFlow or PyTorch.

In below simple example, we are building a linear regression model to predict the ice cream sales based on temperature. 80% of the available data is used for testing and we are using the remaining 20% data for testing our model.

  
import matplotlib.pyplot as plt   
import numpy as np   
from sklearn.linear_model import LinearRegression  
from sklearn.metrics import r2_score  
import pandas as pd  
                  
 # load the dataset   
 Stock_Market = {'Temprature_in_Fahrenheit' :[58, 62, 52, 60, 66, 74, 68, 80, 76, 74, 64,],  
 'Ice_Cream_sales': [215,325,185,332,406,522,412,614,544,44500000,408]          
                        }  
                  
 df = pd.DataFrame(Stock_Market,columns=['Temprature_in_Fahrenheit','Ice_Cream_sales'])  
          
 X = df[['Temprature_in_Fahrenheit']]  
 Y = df['Ice_Cream_sales']  
 # splitting X and y into training and testing sets   
 from sklearn.model_selection import train_test_split   
 X_train, X_test, y_train, y_test = train_test_split(X, Y, 
 test_size=0.2, random_state=1)
          
 # create linear regression object   
 reg = LinearRegression()  
 # train the model using the training sets   
 reg.fit(X_train, y_train)  
  #Prediction  
  y_predict = reg.predict(X_test)  
        
  ## plotting residual errors in training data   
  plt.scatter(reg.predict(X_train), reg.predict(X_train) - 
  y_train, color = "green", s = 10, label = 'Train data')   
  ## plotting residual errors in test data   
  plt.scatter(reg.predict(X_test), reg.predict(X_test) - y_test, 
  color = "blue", s = 10, label = 'Test data')   
  ## plotting line for zero residual error   
  plt.hlines(y = 0, xmin = 0, xmax = 2000, linewidth = 2)   
  ## plotting legend   
  plt.legend(loc = 'upper right')   
  ## plot title   
  plt.title("Residual errors")     
  ## function to show plot   
  plt.show()  
 
      

RABBITMQ

What is RabbitMQ?

RabbitMQ is an open source message broker software. It accepts messages from producers, and delivers them to consumers. It acts like a middleman which can be used to reduce loads and delivery times taken by web application servers

Features of RabbitMQ:

RabbitMQ is an open source message broker software. It accepts messages from producers, and delivers them to consumers. It acts like a middleman which can be used to reduce loads and delivery times taken by web application servers

  • Robust messaging for building applications in a distributed manner.
  • Easy to use
  • Runs on all major Operating Systems.
  • Supports a huge number of developer platforms
  • Supports multiple messaging protocols, message queuing, delivery acknowledgement, flexible routing to queues, multiple exchange type.
  • Open source and commercially supported
How RabbitMQ Works?

The Producer sends messages to an exchange. An exchange is responsible for the routing of the messages to the different queues. An exchange accepts messages from the producer application and routes them to message queues with the help of bindings and routing keys. A binding is a key between a queue and an exchange. Then consumers receive messages from the queue.

Prerequisites:
  • RabbitMQ
  • Python

How to Send and Receive a message using RABBITMQ?

Send a Message using RabbitMQ:

Following Program send.py will send a single message to the queue.

Step 1: To Establish a connection with RabbitMQ server.

 
       
        import pika
        
        connection = pika.BlockingConnection(
            pika.ConnectionParameters(host='localhost'))
        channel = connection.channel()
       
 

Step 2: To Create a hello queue to which the message will be delivered:


channel.queue_declare(queue='hello')
     
   

Step 3: Publish the message and mention the exchange details and queue name in the exchange and routing key params to which queue the message should go.


channel.basic_publish(exchange='', routing_key='hello', body='Hello RabbitMQ!')
        print(" [x] Sent 'Hello RabbitMQ!'")
        

Step 4: closing a connection to make sure the network buffers were flushed and our message was actually delivered to RabbitMQ


connection.close()
       
 
Recieve a message using RabbitMQ:

Following Program recieve.py will send a single message to the queue.

Step 1: It works by subscribing a callbackfunction to a queue. Whenever receiving a message, this callback function is called by the Pika library. Following function will print on the screen the contents of the message.


def callback(ch, method, properties, body):
print(" [x] Received %r" % body)
       
 

Step 2:Next, need to tell RabbitMQ that this particular callback function should receive messages from our hello queue:


channel.basic_consume(
queue='hello', on_message_callback=callback, auto_ack=True)

        

Step 3:And finally, Enter a never-ending loop that waits for data and runs callbacks whenever necessary.


print(' [*] Waiting for messages. To exit press CTRL+C')
channel.start_consuming()
     
   

Step 4:Open terminal. Run the Send.py The producer program will stop after every run: python send.py [x] Sent ‘Hello RabbitMQ!’ We can go to the web browser and hit the URL http://localhost:15672/, and see the count of the message sent as shown below in the dashboard:

Step 5: Open terminal. Run the receive.py program.

python receive.py
[*] Waiting for messages. To exit press CTRL+C
[x] Received ‘Hello RabbitMQ!’

If ready and total count is zero in the dashboard, then confirm the messages are received by consumer.

Note: Continuously send a message through RabbitMQ. As noticed, the receive.py program doesn’t exit. It will stay ready to receive further messages, and may be interrupted with Ctrl-C.

ELK – Elasticsearch, Logstash & Kibana

Introduction

As more and more IT infrastructures move to public clouds such as Amazon Web Services, Microsoft Azure, and Google Cloud, public cloud security tools, and logging platforms are both becoming more and more critical.

The ELK Stack is popular because it fulfills a specific need in the log management and log analysis space. In cloud-based infrastructures, consolidating log outputs to a central location from different sources like web servers, mail servers, database servers, network appliances can be particularly useful. This is especially true when trying to make better data-informed decisions. The ELK stack simplifies searching and analyzing data by providing insights in real-time from the log data.

It is common to run the full ELK stack, not each individual component separately. Each of these services play a important role and in order to perform under high demand it is more advantageous to deploy each service on its own server.

Why ELK?
• Rapid on-premise (or cloud) installation and easy to deploy
• Scales vertically and horizontally
• Easy and various APIs to use
• Ease of writing queries, a lot easier then writing a MapReduce job
• Availability of libraries for most programming/scripting languages
• Elastic offers a host of language clients for Elasticsearch, including Ruby, Python, PHP,
Perl, .NET, Java, and Javascript, and more
• Tools availability
• It’s free (open source), and it’s quick

ELK Stack is used for log collection, indexing and visualization of the collected log data, we can collect any type of logs (windows event logs, http logs , apache server logs etc. ) in the ELK Stack as per configuration.

Logstash:

Logstash is a light-weight, open-source, server-side data processing pipeline that allows you to collect data from a variety of sources, transform it on the fly, and send it to your desired destination. It is most often used as a data pipeline for Elasticsearch, an open-source analytics and search engine. Because of its tight integration with Elasticsearch, powerful log processing capabilities, and over 200 pre-built open-source plugins that can help you easily index your data, Logstash is a popular choice for loading data into Elasticsearch.

Logstash allows you to easily ingest unstructured data from a variety of data sources including system logs, website logs, and application server logs. Logstash offers pre-built filters, so you can readily transform common data types, index them in Elasticsearch, and start querying without having to build custom data transformation pipelines.

The server component of Logstash processes incoming logs. In other words, Logstash collects, parses, and enriches logs before indexing them into Elasticsearch.

It is the pipeline which collects log data and pushes the collected data to the Elasticsearch.

Logstash Input Plugins

• Stdin – Reads events from standard input
• File – Streams events from files (similar to “tail -0F”)
• Syslog – Reads syslog messages as events
• Eventlog – Pulls events from the Windows Event Log
• Imap – read mail from an IMAP server
• Rss – captures the output of command line tools as an event
• Snmptrap – creates events based on SNMP trap messages
• Twitter – Reads events from the Twitter Streaming API
• Irc – reads events from an IRC server
• Exec – Captures the output of a shell command as an event
• Elasticsearch – Reads query results from an Elasticsearch cluster

Logstash Filter Plugins

• grok – parses unstructured event data into fields
• Mutate – performs mutations on fields
• Geoip – adds geographical information about an IP address
• Date – parse dates from fields to use as the Logstash timestamp for
an event
• Cidr – checks IP addresses against a list of network blocks
• Drop – drops all events

Logstash Output Plugins

• Stdout – prints events to the standard output
• Csv – write events to disk in a delimited format
• Email – sends email to a specified address when output is received
• Elasticsearch – stores logs in Elasticsearch
• Exec – runs a command for a matching event
• File – writes events to files on disk
• mongoDB – writes events to MongoDB
• Redmine – creates tickets using the Redmine API

Elasticsearch

Elasticsearch is the Data storage and indexing part of the ELK Stack. It stores data and indexes it.

It is also a Search engine based on Lucene and provides a distributed search engine with an HTTP web interface and schema-free JSON documents.

The distributed nature of Elasticsearch enables it to process large volumes of data in parallel, quickly finding the best matches for your queries. Elasticsearch operations such as reading or writing data usually take less than a second to complete. This lets you use Elasticsearch for near real-time use cases such as application monitoring and anomaly detection.

An index is like a ‘database’ in a relational database. It has a mapping which defines multiple types. We can think of an index as a type of data organization mechanism, allowing the user to partition data a certain way.

Other key concepts of Elasticsearch are replicas and shards, the mechanism Elasticsearch uses to distribute data around the cluster. Elasticsearch implements a clustered architecture that uses sharding to distribute data across multiple nodes, and replication to provide high availability.

The index is a logical namespace which maps to one or more primary shards and can have zero or more replica shards. A shard is a Lucene index and an Elasticsearch index is a collection of shards. The application talks to an index, and Elasticsearch routes the requests to the appropriate shards.

Kibana:

Kibana is the visualization web interface through which we can visualize the indexed log data. Kibana is an open-source data visualization and exploration tool used for log and time-series analytics, application monitoring, and operational intelligence use cases. It offers powerful and easy-to-use features such as histograms, line graphs, pie charts, heat maps, and built-in geospatial support.

How to Import BACPAC File Created from Azure SQL Database?

When you need to export a database for archiving or for moving to another platform, you can export the database schema and data to a BACPAC file. A BACPAC file is a ZIP file with an extension of BACPAC containing the metadata and data from a SQL Server database. A BACPAC file can be stored in Azure Blob storage or in local storage in an on-premises location and later imported back into Azure SQL Database or into a SQL Server on-premises installation.

Import BACPAC File to On-Premise SQL Server :

  • C:\Program Files (x86)\Microsoft SQL Server\140\DAC\bin>
  • SqlPackage.exe /a:import /sf:\\Userdb0.bacpac /tsn:SERVER-SQL\DEV2016 /tdn:Azure_Test /p:CommandTimeout=2400

Error :

When you are try to import BACPAC File created from Azure Environment, you might encounter the following error if it consists of External Data Source Reference.

TITLE: Microsoft SQL Server Management Studio
            
    Could not import package. 
    Warning SQL72012: The object [AzureProd] exists in the target, 
    but it will not be dropped even though you selected the 
    ‘Generate drop statements for objects that are in the target database but that 
    are not in the source’ check box. 
    Warning SQL72012: The object [AzureProd_Log] exists in the target,
    but it will not be dropped even though you selected the 
    ‘Generate drop statements for objects that are in the target database but that 
    are not in the source’ check box. 
    Error SQL72014: .Net SqlClient Data Provider: Msg 102, Level 15, State 1, 
    Line 1 Incorrect syntax near ‘EXTERNAL’. 
    Error SQL72045: Script execution error. The executed script: 
    CREATE EXTERNAL DATA SOURCE [DB_EXT_EDS]
    WITH (
    TYPE = RDBMS,
    LOCATION = N’sqlserver.database.windows.net’,
    DATABASE_NAME = N’AdventureWorks’, 
    CREDENTIAL = [DB_EXT_CRED] ); 
            
    

Solution :

Drop external Tables and External Data Source in Azure SQL Database and create BACPAC File again without those references.

Drop External Tables and External Data Source

            
        IF EXISTS 
        (
        SELECT 'x' FROM sys.external_tables)
        BEGIN
        DROP EXTERNAL TABLE EXT_Table1 
        DROP EXTERNAL TABLE EXT_Table2 
        DROP EXTERNAL TABLE EXT_Table3 
        END   
         
        IF EXISTS 
        ( 
        SELECT * FROM sys.external_data_sources 
        WHERE name ='DB_EXT_EDS' 
        ) 
        BEGIN 
        DROP EXTERNAL DATA SOURCE DB_EXT_EDS; 
        END  
            
        

If you can’t recreate BACPAC without dropping the tables, you can follow these steps.

  1. Change the file extension to zip, then decompress it into a folder. Surprisingly, a bacpac is actually just a zip file, not something proprietary and hard to get into.
  2. Find the model.xml file and edit it to remove the section that looks like this:
<Element Type=”SqlExternalDataSource” Name=”[BoxDataSrc]”>
<Property Name=”DataSourceType” Value=”1′′ />
<Property Name=”Location” Value=”MYAZUREServer.database.windows.net” /> 
<Property Name=”DatabaseName” Value=”MyAzureDb” />
<Relationship Name=”Credential”>
<Entry>
<References Name=”[SQL_Credential]” />
</Entry>
</Relationship>
</Element>

If you have multiple external data sources of this type, you will probably need to repeat step 2 for each one.

Save and close model.xml.

Now you need to re-generate the checksum for model.xml so that the bacpac doesn’t think it was tampered with (since you just tampered with it). Create a PowerShell file named computeHash.ps1 and put this code into it.

Generate Checksum

             
            $modelXmlPath = Read-Host "model.xml file path" 
            $hasher = [System.Security.Cryptography.HashAlgorithm]:
            :Create("System.Security.Cryptography.SHA256Crypt oServiceProvider") 
            $fileStream = new-object System.IO.FileStream ` -ArgumentList
            @($modelXmlPath, [System.IO.FileMode]::Open) 
            $hash = $hasher.ComputeHash($fileStream) 
            $hashString = "" Foreach ($b in $hash) { $hashString += $b.ToString("X2") } 
            $fileStream.Close() $hashString 
            
     

Run the PowerShell script and give it the filepath to your unzipped and edited model.xml file. It will return a checksum value.

Copy the checksum value, then open up Origin.xml and replace the existing checksum, toward the bottom on the line that looks like this:

<Checksum Uri=”/model.xml”>9EA0F06B282G4F42955C78A98822A31AA0ED0225CB131B
8759379055A482D0 1G</Checksum> 

Save and close Origin.xml, then select all the files and put them into a new zip file and rename the extension to bacpac.

Now you can use this new bacpac to import the database without getting the error.

Analytics

Analytics is the discovery, interpretation, and communication of meaningful patterns in data; and the process of applying those patterns towards effective decision making. In other words, analytics can be understood as the connective tissue between data and effective decision making, within an organization. Organizations may apply analytics to business data to describe, predict, and improve business performance.
Big data analytics is the complex process of examining large and varied data sets — or big data — to uncover information including hidden patterns, unknown correlations, market trends and customer preferences that can help organizations make informed business decisions.

Glue, Athena and QuickSight are 3 services under the Analytics Group of services offered by AWS. Glue is used for ETL, Athena for interactive queries and Quicksight for Business Intelligence (BI).

Glue

AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy for customers to prepare and load their data for analytics. We can create and run an ETL job with a few clicks in the AWS Management Console. We simply point AWS Glue to our data stored on AWS, and AWS Glue discovers our data and stores the associated metadata (e.g. table definition and schema) in the AWS Glue Data Catalog. Once cataloged, our data is immediately searchable, queryable, and available for ETL.

In this blog we will look at 2 components of Glue – Crawlers and Jobs

Glue Crawlers

Glue crawlers can scan data in all kinds of repositories, classify it, extract schema information from it, and store the metadata automatically in the AWS Glue Data Catalog. From there it can be used to guide ETL operations.

Suppose we have a file named people.json in S3 with the below contents:

                
        {"name":"Ricky","age":22}
        {"name":"Jeff","age":36}
        {"name":"Geddy","age":62}
                
        

Below are the steps to crawl this data and create a table in AWS Glue to store this data:

  1. On the AWS Glue Console, click “Crawlers” and then “Add Crawler”
  2. Give a name for your crawler and click next
  3. Select S3 as data source and under “Include path” give the location of json file on S3.
  4. Since we are going to crawl data from only 1 dataset, select No in next screen and click Next
  5. In next screen select an IAM role which has access to the S3 data store
  6. Select Frequency as “Run on demand” in next screen.
  7. Select a Database to store the crawler’s output. I chose a database named “saravanan” in the screen below. If no database exists, Add a database using the link given
  8. Review all details in next step and click Finish
  9. On next screen, click on “Run it now” to run the crawler
  10. The crawler runs for around a minute and finally you will be able to see status as Stopping / Ready with Tables added count as 1.
  11. Now you can go to Tables link and see that a table named “people_json” has been created under “Saravanan” database.
  12. Using the “View details” Action, and then scrolling down, you can see the schema for the table which Glue has automatically inferred and generated.
Glue jobs

The AWS Glue Jobs system provides managed infrastructure to orchestrate our ETL workflow. We can create jobs in AWS Glue that automate the scripts we use to extract, transform, and transfer data to different locations. Jobs can be scheduled and chained, or they can be triggered by events such as the arrival of new data.

To add a new job using the console

  1. Open the AWS Glue console, and choose the Jobs tab.
  2. Choose Add job and follow the instructions in the Add job wizard. Below screens copy data from the table we created earlier to a parquet file named people-parquet in same S3 bucket.




    After the above job runs and completes, you will be able to verify in S3 that the output Parquet has been created.
DynamicFrame

Glue Jobs use a data structure named DynamicFrame. A DynamicFrame is similar to a Spark DataFrame, except that each record is self-describing, so no schema is required initially. Instead, AWS Glue computes a schema on-the-fly when required, and explicitly encodes schema inconsistencies using a choice (or union) type.

Instead of just using the python job which Glue generates, we can code our own jobs using DynamicFrames and have

                
        import sys
        from awsglue.transforms import *
        from awsglue.utils import getResolvedOptions
        from pyspark.context import SparkContext
        from awsglue.context import GlueContext
        
        glueContext = GlueContext(SparkContext.getOrCreate())
        
        users = glueContext.create_dynamic_frame.from_catalog(
                database="saravanan",
                table_name="users")
        users_courses = glueContext.create_dynamic_frame.from_catalog(
                database="saravanan",
                table_name="users_courses")
        users = users.select_fields(['AccountName','Id','UserName','FullName','Active'])
        .rename_field('Active','UserActive')
         users_courses = users_courses.select_fields(['UserId', 'Id','Name','Code','Active',
        'Complete','PercentageComplete','Overdue']).rename_field('Id','Course_Id')\
        .rename_field('Name','CourseName').rename_field('Code','CourseCode').rename_field 
         ('Active','CourseActive').rename_field('Complete','CourseComplete')\
         .rename_field('PercentageComplete','CoursePercentageComplete').rename_field
         ('Overdue','CourseOverDue')
        joined_table = Join.apply(users, users_courses, 'Id', 'UserId').drop_fields(['Id'])

        joined_table.toDF().write.parquet('s3://saravanan-glue/parquet_partitioned',
                partitionBy=['AccountName'])
                
        

it run through Glue. The same Glue job on next page selects specific fields from 2 Glue tables, renames some of the fields, joins the tables and writes the joined table to S3 in parquet format.

Athena

Amazon Athena is an interactive query service that makes it easy to analyse data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and we pay only for the queries that we run.

Athena is easy to use. We must simply point to our data in Amazon S3, define the schema, and start querying using standard SQL. Most results are delivered within seconds. With Athena, there’s no need for complex ETL jobs to prepare our data for analysis. This makes it easy for anyone with SQL skills to quickly analyse large-scale datasets.

Athena is out-of-the-box integrated with AWS Glue Data Catalog, allowing us to create a unified metadata repository across various services, crawl data sources to discover schemas and populate your Catalog with new and modified table and partition definitions, and maintain schema versioning.

Since Athena uses same Data Catalog as Glue, we will be able to query and view properties of the people_json table which we created earlier using Glue.

Also, we can create new table using data from S3 bucket data as shown below:


Unlike Glue, we have to explicitly give the data format (CSV, JSON, etc) and specify the column names and types while creating the table in Athena.


We can also manually create and query the tables using SQL as shown below:

QuickSight

Amazon QuickSight is a fast, cloud-powered business intelligence (BI) service that makes it easy for us to deliver insights to everyone in our organization.

QuickSight lets us create and publish interactive dashboards that can be accessed from browsers or mobile devices. We can embed dashboards into our applications, providing our customers with powerful self-service analytics.

QuickSight easily scales to tens of thousands of users without any software to install, servers to deploy, or infrastructure to manage.

Below are the steps to create a sample Analysis in QuickSight:

  1. Any Analysis in QuickSight requires data from a Data Set. First click on the “Manage data” link at top right to list the Data Sets we currently have.
  2. To create a new Data Set, click the “New data set” link
  3. We can create Data Set from any of the Data sources listed here – uploading a file, S3, Athena table, etc.
  4. For our example, I am selecting Athena as data source and giving it a name “Athena source”. Then we must map this to a database / table in Athena.
  5. After we select the Athena table, QuickSight provides us an option to import the data to SPICE. SPICE is Amazon QuickSight’s in-memory optimized calculation engine, designed specifically for fast, adhoc data visualization. SPICE stores our data in a system architected for high availability, where it is saved until we choose to delete it.
  6. Using the Edit/Preview Data option above allows us to select the columns to be included in Data set and rename them if required.
  7. Once we click the “Save & visualize” link above, QuickSight starts creating an Analysis for us. For our exercise we will select the Table visual type from the list.
  8. Add Account Name and User_id by dragging them from “Fields list” to “Group by” and course_active to “Value”
  9. Now we will add 2 parameters for Account Name and Learner id by clicking on Parameters at bottom left. While creating the parameter use the option “Link to a data set field” for Values and link the parameter to the appropriate column in the Athena table
  10. Once the parameters are added, create controls for the parameters. If we are adding 2 parameters with controls, we have option of showing only relevant values for second parameter based on the values selected for first parameter. For this select the “Show relevant values only” checkbox.
  11. Next add 2 Custom filters for Account Name and Learner id. These filters should me mapped to the parameters we had created earlier. For this choose the Filter type as “Custom filter” and select the “Use parameters” checkbox.
  12. Now using the Visualize option, we can verify if our Controls are working correctly
  13. To share the Dashboard with others, use the share option on top towards the right and use publish dashboard. We can search for users of the AWS account by email and publish to selective users.
Messaging system

A Messaging System is responsible for transferring data from one application to another, so the applications can focus on data, but not worry about how to share it. In Big Data, an enormous volume of data is used.

Two types of messaging patterns are available:

Messaging system

In a point-to-point system, messages are persisted in a queue. One or more consumers can consume the messages in the queue, but a particular message can be consumed by a maximum of one consumer only.

Publish-Subscribe Messaging System

In the publish-subscribe system, messages are persisted in a topic. Unlike point-to-point system, consumers can subscribe to one or more topic and consume all the messages in that topic. In the Publish-Subscribe system, message producers are called publishers and message consumers are called subscribers.

Kafka

Apache Kafka is a distributed publish-subscribe messaging system and a robust queue that can handle a high volume of data and enables us to pass messages from one end-point to another.

The following diagram illustrates the main terminologies.

Topics

A stream of messages belonging to a category is called a topic. Data is stored in topics. Kafka topics are analogous to radio / TV channels. Multiple consumers can subscribe to same topic and consume the messages.

Topics are split into partitions. For each topic, Kafka keeps a minimum of one partition. Each such partition contains messages in an immutable ordered sequence.

Partition offset

Each partitioned message has a unique sequence id called as offset. For each topic, the Kafka cluster maintains a partitioned log that looks like this:

Each partition is an ordered, immutable sequence of records that is continually appended to—a structured commit log. The records in the partitions are each assigned a sequential id number called the offset that uniquely identifies each record within the partition.

The Kafka cluster durably persists all published records—whether or not they have been consumed—using a configurable retention period. For example, if the retention policy is set to two days, then for the two days after a record is published, it is available for consumption, after which it will be discarded to free up space.

Replicas of partition

Replicas are nothing but backups of a partition. Replicas are never read or write data. They are used to prevent data loss.

Brokers

Brokers are simple system responsible for maintaining the published data. Each broker may have zero or more partitions per topic.

Kafka Cluster

Kafka’s having more than one broker are called as Kafka cluster.

Kafka Cluster Architecture

Zookeeper

ZooKeeper is used for managing and coordinating Kafka broker. ZooKeeper service is mainly used to notify producer and consumer about the presence of any new broker in the Kafka system or failure of the broker in the Kafka system.

Consumer Group

Consumers label themselves with a consumer group name, and each record published to a topic is delivered to one consumer instance within each subscribing consumer group.

Kafka features
  1. High Throughput: Provides support for hundreds of thousands of messages with modest hardware.
  2. Scalability: Highly scalable distributed system with no downtime
  3. Data Loss: Kafka ensures no data loss once configured properly
  4. Stream processing: Kafka can be used along with real time streaming applications like Spark and Storm
  5. Durability: Provides support to persisting messages on disk
  6. Replication: Messages can be replicated across clusters, which supports multiple subscribers
Installing and Getting started

Prerequisite: Install Java

  1. Download kafka .tgz file from https://kafka.apache.org/downloads
  2. Untar the file and go into the kafka directory
              
             > tar -xzf kafka_2.11-2.1.0.tgz
             > cd kafka_2.11-2.1.0 
             
  3. Start the zookeeper server using the properties in zookeeper.properties
              
            > bin/zookeeper-server-start.sh config/zookeeper.properties
              
            
  4. iv. Start the Kafka broker using the properties in server.properties
              
            > bin/kafka-server-start.sh config/server.properties 
             
            
Create a topic

Let’s create a topic named “test” with a single partition and only one replica:
In below command 2181 is the port number we have specified in zookeeper.properties

            
        > bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
            
        

To list the current list of topics, we can query the zookeeper using below command:

                
                    > bin/kafka-topics.sh --list --zookeeper localhost:2181
        Test
                
        
Command line producer

Kafka comes with a command line client that will take input from a file or from standard input and send it out as messages to the Kafka cluster. By default, each line will be sent as a separate message. Run the producer and then type a few messages into the console to send to the server

(In below command 9092 is the port number we configured for the broker in server.properties)

                
                    > bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
        This is a message
        This is another message
                
        
Command line consumer

Kafka also has a command line consumer that will dump out messages to standard output.

                
                    > bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning
        This is a message
        This is another message
                
        
Kafka producer and consumer using python

The Kafka producer and consumer can be coded in many languages like java, python, etc. In this section, we will see how to send and receive messages from a python topic using python.

  1. First we have to install the kafka-python package using python package manager.
                            
                                pip install kafka-python
                            
                            
  2. Below python program reads records from an input file and sends them as messages to the test topic which we created in previous section

    Producer.py

                    
                        from kafka import KafkaProducer
            producer = KafkaProducer(bootstrap_servers=['localhost:9092'])
            with open('/example/data/inputfile.txt') as f:
                 lines = f.readlines()
            
            for line in lines:
               producer.send('test', line)
                    
            

Below python program consumes the messages from the kafka topic and prints them on the screen.

Consumer.py

                
                    from kafka import KafkaConsumer
        consumer = KafkaConsumer(bootstrap_servers=['localhost:9092'], auto_offset_reset='earliest')
        consumer.subscribe(['test'])
        for msg in consumer:
            print(msg)
                
        

In the above program, we have

                
                    auto_offset_reset=’earliest’.
                
                

This will cause the consumer program to read all the messages from beginning if the same program is run again and again.

Instead if we have

                 auto_offset_reset=’latest’,
                
                

the consumer program will read messages starting from the latest offset which was consumed earlier.

Helm is a package manager for Kubernetes that allows developers and operators to more easily package, configure, and deploy applications and services onto Kubernetes clusters.

What Does Kubernetes Helm Solve?

Kubernetes is known as a complex platform to understand and use. Kubernetes Helm helps make Kubernetes easier and faster to use:

Increased productivity – developers can deploy a pre-tested app via a Helm chart and focus on developing their applications, instead of spending time on deploying test environments to test their Kubernetes clusters

Existing Helm Charts – allow developers to get a working database, big data platform, CMS, etc. deployed for their application with one click. Developers can modify existing charts or create their own to automate dev, test or production processes.

Easier to start with Kubernetes – it can be difficult to get started with Kubernetes and learn how to deploy production-grade applications. Helm provides one click deployment of apps, making it much easier to get started and deploy your first app, even if you don’t have extensive container experience.

Decreased complexity – deployment of Kubernetes-orchestrated apps can be extremely complex. Using incorrect values in configuration files or failing to roll out apps correctly from YAML templates can break deployments. Helm Charts allow the community to preconfigure applications, defining values that are fixed and others that are configurable with sensible defaults, providing a consistent interface for changing configuration. This dramatically reduces complexity, and eliminates deployment errors by locking out incorrect configurations.

Production ready – running Kubernetes in production with all its components (pods, namespaces, deployments, etc.) is difficult and prone to error. With a tested, stable Helm chart, users can deploy to production with confidence, and reduce the complexity of maintaining a Kubernetes App Catalog.

No duplication of effort – once a developer has created a chart, tested and stabilized it once, it can be reused across multiple groups in an organization and outside it. Previously, it was much more difficult (but not impossible) to share Kubernetes applications and replicate them between environments.

Helm provides this functionality through the following components:

  • A command line tool, helm, which provides the user interface to all Helm functionality.
  • A companion server component, tiller, that runs on your Kubernetes cluster, listens for commands from helm, and handles the configuration and deployment of software releases on the cluster.
  • The Helm packaging format, called charts.
  • An official curated charts repository with prepackaged charts for popular open-source software projects.
Installing Helm

There are two parts to Helm: The Helm client (helm) and the Helm server (Tiller).

INSTALLING THE HELM CLIENT

The Helm client can be installed either from source, or from pre-built binary releases.

From the Binary Releases

Every release of Helm provides binary releases for a variety of OSes. These binary versions can be manually downloaded and installed.

Download your desired version

Unpack it (tar -zxvf helm-v2.0.0-linux-amd64.tgz)

Find the helm binary in the unpacked directory, and move it to its desired destination (mv linux-amd64/helm /usr/local/bin/helm)

INSTALLING TILLER

Tiller, the server portion of Helm, typically runs inside of your Kubernetes cluster. But for development, it can also be run locally, and configured to talk to a remote Kubernetes cluster.

The easiest way to install tiller into the cluster is simply to run helm init. This will validate that helm’s local environment is set up correctly (and set it up if necessary). Then it will connect to whatever cluster kubectl connects to by default (kubectl config view). Once it connects, it will install tiller into the kube-system namespace.

After helm init, you should be able to run kubectl get pods –namespace kube-system and see Tiller running.

USING HELM

A Chart is a Helm package. It contains all of the resource definitions necessary to run an application, tool, or service inside of a Kubernetes cluster.

A Repository is the place where charts can be collected and shared.

A Release is an instance of a chart running in a Kubernetes cluster. One chart can often be installed many times into the same cluster. And each time it is installed, a new release is created.

The “helm install” command can install from several sources:

  • A chart repository
  • A local chart archive (helm install foo-0.1.1.tgz)
  • An unpacked chart directory (helm install path/to/foo)
  • A full URL (helm install https://example.com/charts/foo-1.2.3.tgz)
Charts

Helm uses a packaging format called charts. A chart is a collection of files that describe a related set of Kubernetes resources.

THE CHART FILE STRUCTURE

A chart is organized as a collection of files inside of a directory. The directory name is the name of the chart (without versioning information). Thus, a chart describing WordPress would be stored in the wordpress/ directory.

Inside of this directory, Helm will expect a structure that matches this:

 
     wordpress/
     Chart.yaml          # A YAML file containing information about the chart
     LICENSE             # OPTIONAL: A plain text file containing the license for the chart
     README.md           # OPTIONAL: A human-readable README file
     requirements.yaml   # OPTIONAL: A YAML file listing dependencies for the chart
     values.yaml         # The default configuration values for this chart
     charts/             # A directory containing any charts upon which this chart depends.
     templates/          # A directory of templates that, when combined with values,
                         # will generate valid Kubernetes manifest files.
     templates/NOTES.txt # OPTIONAL: A plain text file containing short usage notes
 
EXAMPLE

Lets Build and publish a simple http service and say “Hello world”.

Package and publish via Helm.

 
     Docker: Build and publish “Hello World”
        Dockerfile 

Hello world!

 
     rawapp-index.html hosted withby GitHub
        
        FROM busybox
        ADD app/index.html /www/index.html
        EXPOSE 8005
        CMD httpd -p 8005 -h /www; tail -f /dev/null
        Dockerfile hosted with by GitHub
        
        docker build -t hello-world .
        docker run -p 80:8005 hello-world
        ## open your browser and check http://localhost/
        docker login
        docker tag hello-world {your_dockerhub_user}/hello-world
        docker push {your_dockerhub_user}/hello-world:latest
 

Helm: build and install

We need helm chart files, just do:

 
     helm create helloworld-chart
        
        image:
          repository: {your_dockerhub_user}/hello-world
          tag: latest
          pullPolicy: IfNotPresent
        service:
          name: hello-world
          type: LoadBalancer
          externalPort: 80
          internalPort: 8005
 

Now, we need to package this helm chart

 
     helm package helloworld-chart --debug
        ## helloworld-chart-0.1.0.tgz file was created
        helm install helloworld-chart-0.1.0.tgz --name helloworld
        kubectl get svc --watch # wait for a IP
 
CHART REPOSITORIES

A chart repository is an HTTP server that houses one or more packaged charts. Any HTTP server that can serve YAML files and tar files and can answer GET requests can be used as a repository server.

Deadlocks in Azure SQL Database

Recently we were working with Azure Logic Apps to invoke Azure Functions.
By Default, Logic App runs parallel threads and we didn’t explicitly control the concurrency and left the default values.

So Logic App invoked several concurrent threads which in turn invoked several Azure Functions.
The problem was Azure Functions invoked Database Calls which caused Deadlocks. In Ideal world, Database should be able to handle numerous concurrent functions without deadlocks. Our process high percentage of shared data and we wanted to ensure the consistency , so we had Explicit Transactions in our Stored procedure calls. That’s the root cause of the problem and we didn’t want to remove the explicit Transaction.

The solution we implemented to alleviate this problem is to run this process in Sequence instead of parallel threads.

Log App Concurrency Control Behavior

For each loops execute in parallel by default. Customize the degree of parallelism, or set it to 1 to execute in sequence.

Logic_App_Concurrency

Deadlock Exception

Transaction (Process ID 166) was deadlocked on lock
| communication buffer resources with another process and has been chosen as the deadlock victim. Rerun the transaction.

Deadlock_Exception

Troubleshooting Deadlocks

So we have identified Deadlock happened in the database through our Application Insights. Next logical question is , what caused this deadlock.

Azure SQL Server Deadlock Count

These queries identifies the deadlock event time as well as the deadlock event details.

                
        SELECT * FROM sys.event_log   
        WHERE event_type = 'deadlock';
        WITH CTE AS (  
        SELECT CAST(event_data AS XML)  AS [target_data_XML]   
        FROM sys.fn_xe_telemetry_blob_target_read_file('dl', 
        null, null, null)  
        )  
        SELECT target_data_XML.value('(/event/@timestamp)[1]', 
        'DateTime2') AS Timestamp,  
        target_data_XML.query('/event/data[@name=''xml_report'']
        /value/deadlock') AS deadlock_xml,  
        target_data_XML.query('/event/data[@name=''database_name'']
        /value').value('(/value)[1]', 'nvarchar(100)') AS db_name  
        FROM CTE
                
        

You can save the Deadlock xml as xdl to view the Deadlock Diagram. This provides all the information we need to identify the root cause of the deadlock and take necessary steps to resolve the issue.

References

Grafana is an open-source, general purpose dashboard and graph composer, which runs as a web application.

You can monitor Azure services and applications from Grafana using the Azure Monitor data source plugin. The plugin gathers application performance data collected by the Application Insights SDK as well as infrastructure data provided by Azure Monitor. You can then display this data on your Grafana dashboard.

Grafana uses an Azure Active Directory service principal to connect to Azure Monitor APIs and collect metrics data. You must create a service principal to manage access to your Azure resources.

Why Grafana?

Grafana provides more visualization options than the Azure Portal. It also supports multiple data sources. One can combine data from multiple sources in a single dashboard. Grafana is designed for analyzing and visualizing metrics such as system CPU, memory, disk and I/O utilization. Users can create comprehensive charts with smart axis formats (such as lines and points) as a result of Grafana’s fast, client-side rendering — even over long ranges of time.

Grafana dashboards are what made Grafana such a popular visualization tool. Visualizations in Grafana are called panels, and users can create a dashboard containing panels for different data sources. Grafana supports graph, singlestat, table, heatmap and freetext panel types. Grafana users can make use of a large ecosystem of ready-made dashboards for different data types and sources. Grafana has no time series storage support. Grafana is only a visualization solution. Time series storage is not part of its core functionality.

Some of the features of Grafana are as follows

  • Optimized for Time series
  • Can pull data from Azure Metrics, Log Analytics and Application Insights
  • Azure Data Explorer (formerly known as Kusto) plugin also released.
  • Rich ecosystem of plugins for data sources and dashboards.
  • Open Source, easy to onboard using Docker, Azure App Service etc.
Azure-app-service

Some of the requirements of Grafana are described below.

  • Azure SPN (Service Principal Name) with reader access to subscription
  • Deploy in Azure web apps.
  • Data source plugin “grafana-azure-monitor-datasource”
  • Supports AD integration via LDAP
  • Easy to export/import and templatize
  • Very DevOps friendly
  • Huge collection of panels https://play.grafana.org
Grafana

Azure Monitor Data Source For Grafana

Azure Monitor is the platform service that provides a single source for monitoring Azure resources. The Azure Monitor Data Source plugin supports Azure Monitor, Azure Log Analytics and Application Insights metrics in Grafana.

Features

  • Support for all the Azure Monitor metrics
    • includes support for the latest API version that allows multi-dimensional filtering for the Storage and SQL metrics.
    • Automatic time grain mode which will group the metrics by the most appropriate time grain value
  • Application Insights metrics
    • Write raw log analytics queries, and select x-axis, y-axis, and grouped values manually.
    • Automatic time grain support
  • Support for Log Analytics (both for Azure Monitor and Application Insights)
  • You can combine metrics from both services in the same graph.
Grafana-graph

Azure Monitor for VMs provides an in-depth view of VM health, performance trends, and dependencies. Azure Monitor for VMs includes a set of performance charts that target several key performance indicators (KPIs) to help you determine how well a virtual machine is performing. Azure Monitor for VMs is focused on the operating system as manifested through the processor, memory, network adapters, and disks.

Azure Dashboards

Azure dashboards allow you to combine different kinds of data, including both metrics and logs, into a single pane in the Azure portal. You can optionally share the dashboard with other Azure users. Elements throughout Azure Monitor can be added to an Azure dashboard in addition to the output of any log query or metrics chart. Azure Monitor is single source for monitoring azure resources. Its Azure’s time series database for all azure metrics.

Some of the important aspects of Azure Dashboard

  • No setup required, already available within Azure Portal.
  • Zoom in zoom out for metrics not available
  • All data from Azure resources.
  • Log Analytics/AI queries cannot be parameterized based on Dashboard selection.
  • Query results can be pinned to dashboards
  • Good panels are not tied to products and can’t be customized.
    • Eg. percentile panels is only available in “Container Insights” and VM insights.
    • The panel cannot be used against “Log Analytics” source or Metric source.

Some of the features of Azure Dashboard are as follows

  • Supports visualizing most Azure resources
  • OOB Integrated with Azure RBAC
  • Supports Log Analytics, App Insights and Metrics
  • No Auto refresh per panel
  • No Zoom in Zoom out.
  • Dashboard queries don’t support variables

Azure Dashboards (VM insights/ Container Insights)

  • These tiles can only be accessed by navigating to the VM resource.
  • They cannot be pinned as is, but the detailed version of this can be pinned.
  • No zoom in zoom out capability.

Azure Dashboards – Metrics

  • These are pinnable
  • Don’t support percentiles
  • No drill ability
  • Each Panel is hard coded to a specific data source even if they might be the same behind the scenes.

Comparison between Grafana and Azure Dashboard is shown below.

Azure Dashboard
  • Multiple Azure resource types
  • Limited configuration options. Requires JSON editing
  • Application Insights à OOB Azure Dashboard
  • Only static queries
  • No setup required
  • Not intuitive for overlaying.
Grafana
  • Mostly Time Series
  • Highly configurable
  • Global variables as filters
  • Dashbord and individual panel refresh.
  • Supports query macros
  • Setup required (minimal)
  • Intuitive overlays

Introduction to Alexa

Amazon’s Alexa is the voice activated, interactive  AI Bot, or intelligent personal assistant in the cloud that lets people speak with their Amazon Echo, Echo dot and other Amazon smart home devices. Alexa is designed to respond to number of commands and converse with people.

Alexa Skills are apps that give Alexa even more abilities. These skills can let her speak to more devices or websites.When the Alexa device is connected through wi-fi or Bluetooth to the internet, it wakes up by merely saying “ALEXA”.  Alexa Skills radically expands the bots repertoire, allowing users to perform more actions with voice-activated control through Alexa.

Overview of Alexa Skill

The most important part of ALEXA skill is its interaction design. Alexa skills don’t have visual feedback  like web or desk top applications and will guide the user through the skill using voice. All Alexa skill replies needs to tell the user clearly what the next options are.

Functional Architechture

An Alexa skill is a small application that interacts with Alexa via an AWS Lambda function

Functional Architechture

Designing the Alexa Skill

The most important aspect of the skill is its Vocal interface. The skill should be interacting naturally with the user. The components of Alexa Skill are :

  • Alexa requires a word, often called as Wake phrase which would alert the device that they can expect a command immediately after.The Default wake phrase is ALEXA. It can also be Amazon, Echo,Computer.
  • Launch Phrase is the word that tells Alexa to trigger a skill. Examples of Launch phrase are “ OPEN”, “ASK” , “START”, “LAUNCH”.
  • Invocation name is the name of the skill.
  • Intents are the goals that the user is trying to achieve by invoking the skill.
  • Utterance tells alexa what the skill should do. Apart from Static utterances such as Start and Launch, dynamic commands can be added. These Dynamic commands are called slots.
  • Each intent can contain one or more slots. A Slot is the variable that is parsed and exposed to the application code.

Sampleutterances

Alexa has a built in natural language processing engine. To Map the verbal phrase to an intent, Alexa handles the complexity of natural language processing through the help of a manually curated file Sampleutterances.txt.

The first word in each line of SampleUtteranaces.txt is the intent name. The application code reads the value of the intent name and responds appropriately. Following the intent name, is the phrase that the user says to achieve that intent. The User might tell a phrase apart from those defined as Slots in the intent. The application is free to react differently based on the presence or value of the slot. To give Alexa the best chance of understanding users, it is recommended to include as many sample utterances as possible. Depending on the skill there could be n number of ever changing sample utterances.

The below example sums up the entire vocal interface

entire vocal interface

Build and Publish a new skill

Building and publishing a new skill in Alexa comprises of  the below steps:

1. Create and Configure Skill :

Create an Alexa skill using  https://developer.amazon.com/alexa-skills-kit. This will open the skill information where we can specify the name of the skill and the invocation name.

2. Create Interaction Model:

Interaction model is a set of rules that defines the way the user interacts with your skill. As a part of an interaction model, Intents, Utterances  are defined. The intent schema should be in JSON format and it should define an array of intents, each with a name, and an optional list of dynamic parts — slots. Alexa will automatically train itself with the provided interaction model.

3. Coding the Backend system:

Once the interaction model has been designed Code and deployment of the Lambda function has to be done.

For each intent, an input/output contract has to be implemented. The input is an IntentRequest which is a representation of the user’s request and includes all the slot values.

The response from alexa can be of  multiple ways.

  • Ask the user a question and wait for response.
  • Give the details to the user and shut down.
  • Say nothing and shut down.

Alexa can either respond verbally or the response could be displayed on the phone.

4. Deploying the Backend system:

The skills can be deployed as an AWS Lambda function with code written in Java or Node.Js, Python or C#. The simplest approach would be the code in Node.Js.

5. Testing the Skill:

Testing of the Skill can either be done through the test simulator available in the Developer console account or through the device connected to the development account.

6. Publishing the Skill:

To Publish the skill, The skill has to be  submitted by filling out the “Publishing Information” and the “Privacy & Compliance” sections

Spring Initializer – using Java and Maven

Spring Initializr is a web tool which is provided by Spring on official site https://start.spring.io/ We can create Spring Boot project by providing project details.

In the below example, we added the springboot-starter-web dependency to write REST Endpoints.

spring_initializer_img

After providing the Group, Artifact, Dependencies, Build Project, Platform and Version, click Generate Project button. The zip file will get downloaded and the files will be extracted. After the project is downloaded, unzip the file.

The maven file pom.xml will have the Web dependency we had selected above.

Web_dependency_img

Note that only the Spring boot starter parent has a version number. Spring boot starter web doesn’t have a version as it is automatically configured based on version of the parent.

You can find the main class file under src/java/main directories with the default package.

directories_img

To write a simple Hello World Rest Endpoint in the Spring Boot Application main class file itself, follow the steps shown below:

  • Firstly, add the @RestController annotation at the top of the class.
  • Now, write a Request URI method with @RequestMapping annotation.
  • Then, the Request URI method should return the Hello World string.
application_main_img

Create an executable JAR by executing the below Maven command in the folder having pom.xml
C:\Users\SaravananP\Downloads\demo\mvn clean install

install_img

The .jar file will be created in the target folder as indicated above

Run the Jar file using java –jar and verify the results

verify_img

result_img

Application Properties

In the above examples, we have seen that Spring boot automatically configured Tomcat to run in port 8080. We can override this by specifying the port in the file src\main\resources\application.properties

port_img

If we rebuild the jar and execute it, we will get an error in http://localhost:8080 and be able to see the Hello World message in http://localhost:9090

404_img

result_img1

Install and Run SQL Server Docker Container on Mac

Like most people, I use Mac , Windows as well Linux OS for development and testing purposes. Primarily I use Mac for Development purpose. I have few projects which uses SQL Server as Data Storage Layer. Setting up Docker Container on Mac and Opening up the ports was pretty easy and doesn’t take more than 10 Minutes.

Steps followed :
  • Install Docker
  • Pull SQL Server Docker Image
  • Run SQL Server Docker Image
  • Install mssql Client
  • Install Kitematic
  • Open the Ports to connect to SQL Server from the network
  • Setup port forwarding to enable access outside the network
Install Docker :

Get Docker dmg image and install. Just follow the prompts and its very straight forward. 
https://docs.docker.com/docker-for-mac/install/#download-docker-for-mac https://download.docker.com/mac/stable/Docker.dmg

Once you have installed docker , you can verify the installation and version.

                bash-3.2$ docker -v
        Docker version 17.09.0-ce, build afdb6d4 
Pull SQL Server Docker Image ( DEV Version )
                docker pull microsoft/mssql-server-linux:2017-latest 
Create SQL Server Container from the Image and Expose it on port 1433 ( Default Port )
                docker run -d --name macsqlserver -e 'ACCEPT_EULA=Y' -e 'SA_PASSWORD=Passw1rd' -e 'MSSQL_PID=Developer' -p 1433:1433 microsoft/mssql-server-linux:2017-latest 

-d: this launches the container in daemon mode, so it runs in the background

–name name_your_container (macsqlserver): give your Docker container a friendly name, which is useful for stopping and starting containers from the Terminal.

-e ‘ACCEPT_EULA=Y: this sets an environment variable in the container named ACCEPT_EULAto the value Y. This is required to run SQL Server for Linux.

-e ‘SA_PASSWORD=Passw1rd’: this sets an environment variable for the sa database password. Set this to your own strong password. Also required.

-e ‘MSSQL_PID=Developer’: this sets an environment variable to instruct SQL Server to run as the Developer Edition.

-p 1433:1433: this maps the local port 1433 to the container’s port 1433. SQL Server, by default, listens for connections on TCP port 1433.

microsoft/mssql-server-linux: this final parameter tells Docker which image to use

Install SQL Client for MAC

If you don’t have npm installed in Mac, install homebrew and node.

                ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
        brew install node
        node -v
        npm -v 
                $ npm install -g sql-cli
         
        /usr/local/bin/mssql -> /usr/local/lib/node_modules/sql-cli/bin/mssql
        /usr/local/lib
        └── [email protected]
         
        $ npm i -g npm 
Connect to SQL Server Instance
                $ mssql -u sa -p Passw1rd
        Connecting to localhost...done
         
        sql-cli version 0.6.2
        Enter ".help" for usage hints.
        mssql> select * from sys.dm_exec_connections 
Get External Tools to Manage Docker

Kitematic

https://kitematic.com/

Open Up the Firewall to connect to SQL Server from outside the Host

Ensure your firewall is configured to allow the connections to the SQL Server. I turned of “Block all incoming connections” and enabled “Automatically allow downloaded signed software to receive incoming connections”. Without proper firewall configurations, you won’t be able to connect to the SQL Server outside the host.

Ensure Firewall allows the incoming connections to the Docker
Connecting from the Internet ( Port forwarding Setup )

Lets say you want to connect to the SQL Server you setup from outside the network or from anywhere via internet,you can setup port forwarding.

Get your public facing IP and setup the port forwarding for Port 1433 ( SQL Server port you have setup your docker container ). If its setup correctly , you should be able to telnet into that port to verify the connectivity.

        telnet 69.11.122.159 1433 

 Unless you absolutely require it , its very bad idea to expose the SQL Server to internet. It should be behind the network , only your webserver should be accessible via internet.

Troubleshooting :

While launching docker container , if you get the error saying that it doesn’t have enough memory to launch SQL Server Container, go ahead and modify the memory allocation for docker container.

  • This image requires Docker Engine 1.8+ in any of their supported platforms.
  • At least 3.25 GB of RAM. Make sure to assign enough memory to the Docker VM if you’re running on Docker for Mac or Windows.

I have setup this way.

Docker Memory configs

If you don’t provision enough memory, you will error like this.

Docker SQL Server Memory Error
Look into Docker logs

Following command ( docker ps -a and docker logs mcsqlserver ) shows list of running processes and docker logs.

        $ docker ps -a
CONTAINER ID        IMAGE                                      COMMAND                  CREATED             STATUS              PORTS                    NAMES
9ea3a24563f9        microsoft/mssql-server-linux:2017-latest   "/bin/sh -c /opt/m..."   About an hour ago   Up About an hour    0.0.0.0:1433->1433/tcp   macsqlserver
$ docker logs macsqlserver
2017-10-08 23:06:52.29 Server      Setup step is copying system data file 
'C:\templatedata\master.mdf' to '/var/opt/mssql/data/master.mdf'.
2017-10-08 23:06:52.36 Server      Setup step is copying system data file 
'C:\templatedata\mastlog.ldf' to '/var/opt/mssql/data/mastlog.ldf'.
2017-10-08 23:06:52.36 Server      Setup step is copying system data file 
'C:\templatedata\model.mdf' to '/var/opt/mssql/data/model.mdf'.
2017-10-08 23:06:52.38 Server      Setup step is copying system data file 
'C:\templatedata\modellog.ldf' to '/var/opt/mssql/data/modellog.ldf'.
 
Security:

I highly recommend to create least privileged accounts and disable SA login. If you are exposing your SQL Server to internet, there are ton of hacking and pentest tools which uses sa login for brute force attack.

Bulk Load Data Files in S3 Bucket into Aurora RDS

We typically get data feeds from our clients ( usually about ~ 5 – 20 GB) worth of data. We download these data files to our lab environment and use shell scripts to load the data into AURORA RDS . We wanted to avoid unnecessary data transfers and decided to setup data pipe line to automate the process and use S3 Buckets for file uploads from the clients.

In theory it’s very simple process of setting up data pipeline to load data from S3 Bucket into Aurora Instance .Even though it’s trivial , setting up this process is very convoluted multi step process . It’s not as simple as it sounds . Welcome to Managed services world.

STEPS INVOLVED :
  • Create ROLE and Attach S3 Bucket Policy :
  • Create Cluster Parameter Group
  • Modify Custom Parameter Groups to use ROLE
  • REBOOT AURORA INSTANCE
GRANT AURORA INSTANCE ACCESS TO S3 BUCKET

By default aurora cannot access S3 Buckets and we all know it’s just common sense default setup to reduce the surface area for better security.

For EC2 Machines you can attach a role and the EC2 machines can access other AWS services on behalf of role assigned to the Instance.Same method is applicable for AURORA RDS. You Can associate a role to AURORA RDS which has required permissions to S3 Bucket .

There are ton of documentation on how to create a role and attach policies . It’s pretty widely adopted best practice in AWS world. Based on AWS Documentation, AWS Rotates access keys attached to these roles automatically. From security aspect , its lot better than using hard coded Access Keys.

In Traditional Datacenter world , you would typically run few configuration commands to change configuration options .( Think of sp_configure in SQL Server ).

In AWS RDS World , its tricky . By default configurations gets attached to your AURORA Cluster . If you need to override any default configuration , you have to create your own DB Cluster Parameter Group and modify your RDS instance to use the custom DB Cluster Parameter Group you created.Now you can edit your configuration values .

The way you attach a ROLE to AURORA RDS is through Cluster parameter group .

These three configuration options are related to interaction with S3 Buckets.

  • aws_default_s3_role
  • aurora_load_from_s3_role
  • aurora_select_into_s3_role

Get the ARN for your Role and modify above configuration values from default empty string to ROLE ARN value.

Then you need to modify your Aurora instance and select to use the role . It should show up in the drop down menu in the modify role tab.

GRANT AURORA LOGIN LOAD FILE PERMISSION
 
        
        GRANT LOAD FROM S3 ON *.* TO user@domain-or-ip-address
        GRANT LOAD FROM S3 ON *.* TO 'aurora-load-svc'@'%' 
REBOOT AURORA INSTANCE

Without Reboot you will be spending lot of time troubleshooting. You need to reboot to the AURORA Instance for new cluster parameter values to take effect.

After this you will be be able to execute the LOAD FILE FROM S3 to AURORA .

Screen Shots :
Create ROLE and Attach Policy :


Attach S3 Bucket Policy :

Create Parameter Group :

Modify Custom Parameter Groups

Modify AURORA RDS Instance to use ROLE

Troubleshooting :
Errors :

Error Code: 1871. S3 API returned error: Missing Credentials: Cannot instantiate S3 Client 0.078 sec

Usually means , AURORA Instance can’t reach S3 Bucket. Make sure you have applied the role and rebooted the Instance.

Sample BULK LOAD Command :

You could use following sample scripts to test your Setup.

 
        
        LOAD DATA FROM S3 's3://yourbucket/allusers_pipe.txt'
        INTO TABLE ETLStage.users
        FIELDS TERMINATED BY '|'
        LINES TERMINATED BY '\n'
        (@var1, @var2, @var3, @var4, @var5, @var6, @var7, @var8, @var9, @var10, @var11, @var12, @var13, @var14, @var15, @var16, @var17, @var18)
        SET
        userid = @var1,
        username = @var2,
        firstname = @var3,
        lastname = @var4,
        city=@var5,
        state=@var6,
        email=@var7,
        phone=@var8,
        likesports=@var9,
        liketheatre=@var10,
        likeconcerts=@var11,
        likejazz=@var12,
        likeclassical=@var13,
        likeopera=@var14,
        likerock=@var15,
        likevegas=@var16,
        likebroadway=@var17,
        likemusicals=@var18 

Sample File in S3 Public Bucket : s3://awssampledbuswest2/tickit/allusers_pipe.txt

 
        
        SELECT * FROM ETLStage.users INTO OUTFILE S3's3-us-west-2://s3samplebucketname/outputestdata'
        FIELDS TERMINATED BY ','
        LINES TERMINATED BY '\n'
        MANIFEST ON
        OVERWRITE ON; 
 
        
        create table users_01(
        userid integer not null primary key,
        username char(8),
        firstname varchar(30),
        lastname varchar(30),
        city varchar(30),
        state char(2),
        email varchar(100),
        phone char(14),
        likesports varchar(100),
        liketheatre varchar(100),
        likeconcerts varchar(100),
        likejazz varchar(100),
        likeclassical varchar(100),
        likeopera varchar(100),
        likerock varchar(100),
        likevegas varchar(100),
        likebroadway varchar(100),
        likemusicals varchar(100)) 

This is a continuation of the blog post that covers how to setup and run Image2Docker on local machines.

Local Machines
  • This mode looks for the IIS installed on the local machine and converts the IIS sites /virtual directories/ applications to docker files associate artifacts.
  • Run the following command
     
                    
                     ConvertTo-Dockerfile `
                     -Local `
                     -OutputPath {{OutputPath}} `
                     -Artifact IIS  `	
                     -Verbose 
  • Local parameter is used for iis discovery on local machines.
  • OutputPath parameter specifies the location to store the generated Dockerfile and associated artifacts.
  • Artifact parameter specifies what artifact to inspect. In our case this is IIS.
  • Verbose parameter is optional and it will give all the verbose logs.
  • Following is the sample command
     
                    
                    ConvertTo-Dockerfile -Local -OutputPath c:\docker_repo\iis -Artifact IIS -Verbose 

When it completes, the cmdlet generates a Dockerfile which turns that web server into a Docker image. The Dockerfile has instructions to install IIS and ASP.NET, copy in the website content, and create the sites in IIS.

Disk Images
  • After installing the Image2Docker PowerShell module, you will need one or more valid .vhdx or .wim files (the “source image”). To perform a scan of a valid VHDX or WIM image file, simply call the ConvertTo-Dockerfile command and specify the -ImagePath parameter, passing in the fully-qualified filesystem path to the source image.
  • Run the following command
     
                    
                     ConvertTo-Dockerfile `
                     -ImagePath {{ImagePath}} `
                     -OutputPath {{OutputPath}} `
                     -Artifact IIS  `	
                     -Verbose 
  • ImagePath parameter, specifying the location of the disk image. {{ImagePath}} -> Provide your valid .vhdx or .wim images path stored in the local machine. The disk image must be available locally.
  • OutputPath parameter specifies the location to store the generated Dockerfile and associated artifacts.
  • Artifact parameter specifies what artifact to inspect. In our case this is IIS.
  • Verbose parameter is optional and it will give all the verbose logs.
  • Following is the sample command
     
                    
                    ConvertTo-Dockerfile -ImagePath C:\vhds\qa-webserver-01.vhd -OutputPath 
                    c:\docker_repo\iis -Artifact IIS -Verbose 

The qa-webserver-01.vhd contains Two websites. One is AspNet MVC app and another one is the WEB API.

When the docker commandlet completes, the cmdlet generates a Dockerfile which turns that web server into a Docker image. The Dockerfile has instructions to install IIS and ASP.NET, copy in the website content, and create the sites in IIS.

The Image2Docker creates the website contents for ASPNET MVC app & WEB API and extract the dockerfile containing the websites configured on the image file.

Cloud computing is providing developers and IT departments with the ability to focus on what matters most and avoid undifferentiated work like procurement, maintenance, and capacity planning. As cloud computing has grown in popularity, several different models and deployment strategies have emerged to help meet specific needs of different users. Each type of cloud service, and deployment method, provides you with different levels of control, flexibility, and management. Understanding the differences between Infrastructure as a Service, Platform as a Service, and Software as a Service, as well as what deployment strategies you can use, can help you decide what set of services is right for your needs.

 

 
Cloud Computing Models

There are three main models for cloud computing. Each model represents a different part of the cloud computing stack.

 
cloud-computing-models_iaas

 
Infrastructure as a Service (IaaS):

Infrastructure as a Service, sometimes abbreviated as IaaS, contains the basic building blocks for cloud IT and typically provide access to networking features, computers (virtual or on dedicated hardware), and data storage space. Infrastructure as a Service provides you with the highest level of flexibility and management control over your IT resources and is most similar to existing IT resources that many IT departments and developers are familiar with today.

 
cloud-computing-models_paas

 
Platform as a Service (PaaS):

Platforms as a service remove the need for organizations to manage the underlying infrastructure (usually hardware and operating systems) and allow you to focus on the deployment and management of your applications. This helps you be more efficient as you don’t need to worry about resource procurement, capacity planning, software maintenance, patching, or any of the other undifferentiated heavy lifting involved in running your application.

 
cloud-computing-models_saas

 
Software as a Service (SaaS):

Software as a Service provides you with a completed product that is run and managed by the service provider. In most cases, people referring to Software as a Service are referring to end-user applications. With a SaaS offering you do not have to think about how the service is maintained or how the underlying infrastructure is managed; you only need to think about how you will use that particular piece software. A common example of a SaaS application is web-based email where you can send and receive email without having to manage feature additions to the email product or maintaining the servers and operating systems that the email program is running on.

We are evaluating pros and cons of different hosting solutions for SQL Server which best suits our business needs.

Our business needs

Our demand is very predictable seasonal demand. We are very small and can’t afford dedicated team for managing database infrastructure.( No DBA Team) Sky high expectation from Customers on availability and reliability for about 2 months in a year. Few minutes of downtown during peak period can cause havoc to our business . Fixed budget with very little wiggle room.   Our plan is to evaluate AWS SQL Server RDS, Azure RDS , Managed solutions from hosting provider. Evaluate each option in these categories.

  1. Performance and Reliability
  2. Ability to scale up during peak loads
  3. Cost ( Based on Network , Storage, Memory and CPU )
  4. Operations Efficiency
  5. Compliance
Infrastructure Requirements :

SQL Server Enterprise Edition since we use enterprise features AlwaysOn Availability group for High Availability Geo Replication or Multi Availability zone implementation for Cloud based databases Ability to route Read/Write workloads 128 Gig RAM – Minimum 1 – 2 TB Storage with 500 Gigs of SSD for TempDB Database and High Volume Tables Memory Optimized OLTP Support which needs SQL Server 2016 Edition Ability to handle ~ 30 K IOPS during peak load.

Amazon AWS SQL Server RDS

RDS Pricing Link  AWS SQL Server RDS Pricing http://www.ec2instances.info/rds/?selected=db.r3.8xlarge

Enterprise Edition  Single-AZ Deployment
 Price Per Hour
Memory Optimized Instances – Current Generation
db.r3.2xlarge$5.810
db.r3.4xlarge$11.404
db.r3.8xlarge$19.271

 

Multi-AZ Deployment
 Price Per Hour
Memory Optimized Instances – Current Generation
db.r3.2xlarge$11.620
db.r3.4xlarge$22.808
db.r3.8xlarge$38.542

AWS SQL Server RDS Configurations On-Demand for SQL Server (License Included) Multi-AZ Deployment Region:  US East (N. Virginia) Memory Optimized Instances – Current Generation Price Per Hour RAM : 244 GB 10 Gigabit 32 vCPU 20,000 Provisioned IOPS  

db.r3.8xlarge244 GB2 x 320 SSDIntel Xeon E5-2670 v2 (Ivy Bridge)32 vCPUs10 Gigabit

https://aws.amazon.com/rds/sqlserver/pricing/
Azure Pricing Calculator

Azure performance is measured in DTU. We have been collecting our performance metrics during load test. The following link provides lightweight utility to convert perfmon counters to Azure DTU’s.

Perfmon to Azure DTU calculator

Understanding DTUs Based on Microsoft definition :https://azure.microsoft.com/en-us/documentation/articles/sql-database-service-tiers/  

The Database Transaction Unit (DTU) is the unit of measure in SQL Database that represents the relative power of databases based on a real-world measure: the database transaction. We took a set of operations that are typical for an online transaction processing (OLTP) request, and then measured how many transactions could be completed per second under fully loaded.

 : Azure RDS Pricing Calculator
Azure SQL Server Pricing Calculator
Azure Options for SQL Server
https://azure.microsoft.com/en-us/pricing/details/sql-database/
Basic
eDTUs PER POOLMAX STORAGE PER POOL 1MAX DBs PER POOLMAX eDTUs PER DATABASEPRICE 2
10010 GB2005~$149/mo
20020 GB4005~$298/mo
40039 GB4005~$595/mo
80078 GB4005~$1,198/mo
1200117 GB4005~$1,800/mo
Standard
eDTUs PER POOLMAX STORAGE PER POOL 1MAX DBs PER POOLMAX eDTUs PER DATABASEPRICE 2
100100 GB200100~$223/mo
200200 GB400100~$446/mo
400400 GB400100~$900/mo
800800 GB400100~$1,800/mo
12001.2 TB400100~$2,701/mo
Premium
eDTUs PER POOLMAX STORAGE PER POOL 1MAX DBs PER POOLMAX eDTUs PER DATABASEPRICE 2
125250 GB50125~$697/mo
250500 GB50250~$1,399/mo
500750 GB50500~$2,790/mo
1000750 GB501000~$5,580/mo
1500750 GB501000~$8,370/mo
AWS SLA Summary
AWS ServiceSLAService CreditPctSLA Resource
RDS99.95%Less than 99.95% but equal to or greater than 99.0%10 %https://aws.amazon.com/rds/sla/
RDS99.95%Less than 99.0%25 %https://aws.amazon.com/rds/sla/
S399.9%Equal to or greater than 99.0% but less than 99.9%10 %https://aws.amazon.com/s3/sla/
S399.9%Less than 99.0%25 %https://aws.amazon.com/s3/sla/
EC299.95%Less than 99.95% but equal to or greater than 99.0%10 %https://aws.amazon.com/ec2/sla/
EC299.95%Less than 99.0%30 %https://aws.amazon.com/ec2/sla/
Route 53100 %5 – 30 minutes in a Billing Cycle1 day Service Credithttps://aws.amazon.com/route53/sla/
Route 53100 %31 minutes – 4 hours in a Billing Cycle7 days Service Credithttps://aws.amazon.com/route53/sla/
Route 53100 %More than 4 hours in a Billing Cycle30 days Service Credithttps://aws.amazon.com/route53/sla/
SLA Percentages
Availability %Downtime/MonthDowntime/WeekDowntime/Day
90% (“one nine”)72 hours16.8 hours2.4 hours
95%36 hours8.4 hours1.2 hours
97%21.6 hours5.04 hours43.2 minutes
98%14.4 hours3.36 hours28.8 minutes
99% (“two nines”)7.20 hours1.68 hours14.4 minutes
99.5%3.60 hours50.4 minutes7.2 minutes
99.8%86.23 minutes20.16 minutes2.88 minutes
99.9% (“three nines”)43.8 minutes10.1 minutes1.44 minutes
99.95%21.56 minutes5.04 minutes43.2 seconds
99.99% (“four nines”)4.38 minutes1.01 minutes8.66 seconds

  Useful Links Cloud Provider Service Availability https://cloudharmony.com/status

According to AWS Documentation The first time a DB instance is started and accesses an area of disk for the first time, the process can take longer than all subsequent accesses to the same disk area. This is known as the “first touch penalty.” Once an area of disk has incurred the first touch penalty, that area of disk does not incur the penalty again for the life of the instance, even if the DB instance is rebooted, restarted, or the DB instance class changes. Note that a DB instance created from a snapshot, a point-in-time restore, or a read replica is a new instance and does incur this first touch penalty.

Reference : http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_Storage.html.
I captured number of cached pages per database on our SQL Server RDS Instance and rebooted the instance and captured cached pages again. Based on the documentation, the cached pages should be available and shouldn’t be affected by Reboot process. Its such a neat feature which makes life lot easier for DBA’s to respond to unexpected situations. But I noticed significant differences between the number of  cached pages before and after reboot.

Cached Pages Before and After SQL Server RDS Reboot
DBNameBef_Buf_PagesSize_MBAft_Buf_PagesSize_MB
DatabaseOne55848243635964
DatabaseTwo101774873
DatabaseThree60942011
master19011070
model180370
msdb88262842
rdsadmin12539870
DatabaseFour1331590
Resource Database1877143192
tempdb77928060881230

For DatabaseOne , Cached Pages dropped from 558482 to 596. I am not sure whether others have encountered the same issue. Not sure what to think of the First Touch Penalty Promise to keep the cache intact. Maybe its not true for SQL Server RDS. 🙂

Structured, Semi-structured and Unstructured data

Big Data includes huge volume, high velocity, and extensible variety of data. These are 3 types: Structured data, Semi-structured data, and Unstructured data.

  1. Structured data is a data whose elements are addressable for effective analysis. It has been organised into a formatted repository that is typically a database. Example: Relational database.
  2. Semi-structured data is information that does not reside in a rational database but that have some organizational properties that make it easier to analyse. With some process, we can store them in the relation database (it could be very hard for some kind of semi-structured data), but Semi-structured exist to ease space. Example: XML data, JSON.
  3. Unstructured data is a data that is which is not organised in a pre-defined manner or does not have a pre-defined data model, thus it is not a good fit for a mainstream relational database. So for Unstructured data, there are alternative platforms for storing and managing, it is increasingly prevalent in IT systems and is used by organizations in a variety of business intelligence and analytics applications. Example: Word, PDF, Text, Media logs.

NoSQL (Not Only SQL database)

NoSQL is an approach to database design that can accommodate a wide variety of data models, including key-value, document, columnar and graph formats. NoSQL, which stand for “not only SQL,” is an alternative to traditional relational databases in which data is placed in tables and data schema is carefully designed before the database is built. NoSQL databases are especially useful for working with large sets of distributed data.

Key-value stores, or key-value databases, implement a simple data model that pairs a unique key with an associated value.

Document databases, also called document stores, store semi-structured data and descriptions of that data in document format. They allow developers to create and update programs without needing to reference master schema. Use of document databases has increased along with use of JavaScript and the JavaScript Object Notation (JSON).

Wide-column stores organize data tables as columns instead of as rows.

Graph data stores organize data as nodes, which are like records in a relational database, and edges, which represent connections between nodes.

Couchbase

Couchbase Server, originally known as Membase, is an open-source, distributed (shared-nothing architecture) multi-model NoSQL document-oriented database software package that is optimized for interactive applications. Couchbase Server is designed to provide easy-to-scale key-value or JSON document access with low latency and high sustained throughput. It is designed to be clustered from a single machine to very large-scale deployments spanning many machines.

Coubase Inc. describes Couchbase as an Engagement Database, a new category of database that enables enterprises to continually create and reinvent the customer experience. Unlike traditional databases, the Engagement Database taps into dynamic data, at any scale and across any channel or device, to liberate data’s full potential at a time when the strategic use of data to create exceptional customer experiences has become a key competitive differentiator for businesses.

In Engagement Database architecture data is first cached in memory, replicated for availability and then finally written to disk.

Core features of Couchbase

Data: Couchbase Server stores data as items. Each item consists of a key, by which the item is referenced; and an associated value, which must be either binary or a JSON document.

Buckets, Memory, and Storage: Items are stored in named Buckets; being kept only in memory, or both in memory and on disk.

Services: Services can be deployed to support different forms of data-access. Details are given in next section.

Clusters and Availability: A single node running Couchbase Server is considered a cluster of one node. As successive nodes are initialized, each can be configured to join the existing cluster.

Across the nodes of each cluster, Couchbase data is evenly distributed and replicated: nodes can be removed, and node-failure handled, without data-loss. Data can be selected for replication across clusters residing in different data centres, to ensure high availability.

Services

Couchbase Server provides the following services:

  1. Data: Supports the storing, setting, and retrieving of data-items, specified by key.
  2. Query: Parses queries specified in the N1QL query-language, executes the queries, and returns results. The Query Service interacts with both the Data and Index services.
  3. Index: Creates indexes, for use by the Query and Analytics services.
  4. Search: Create indexes specially purposed for Full Text Search. This supports language-aware searching; allowing users to search for, say, the word beauties, and additionally obtain results for beauty and beautiful.
  5. Analytics: Supports join, set, aggregation, and grouping operations; which are expected to be large, long-running, and highly consumptive of memory and CPU resources.
  6. Eventing: Supports near real-time handling of changes to data: code can be executed both in response to document-mutations, and as scheduled by timers.

N1QL

N1QL (pronounced nickel), is used for manipulating the JSON data in Couchbase, just like SQL manipulates data in RDBMS. It has SELECT, INSERT, UPDATE, DELETE, MERGE statements to operate on JSON data.

The N1QL data model is non-first normal form (N1NF) with support for nested attributes and domain-oriented normalization. The N1QL data model is also a proper superset and generalization of the relational model.

Example
 {
          "email": "[email protected]",
          "friends": [
            {"name":"rick"},
            {"name":"cate"}
          ]
        }
  

Like Query
 SELECT * FROM `bucket` WHERE email LIKE "%@example.org";
  

Array Query
 SELECT * FROM `bucket` WHERE ANY x IN friends SATISFIES x.name = "cate" END;  

Programming model

Couchbase provides client libraries for different programming languages such as Java / .NET / PHP / Ruby / C / Python / Node.js

Following is the core API that Couchbase offers. (in an abstract sense)

 # Get a document by key
        doc = get(key)
        
        # Modify a document, notice the whole document 
        #   need to be passed in
        set(key, doc)
        
        # Modify a document when no one has modified it 
        #  since my last read
        casVersion = doc.getCas()
        cas(key, casVersion, changedDoc)
        
        # Create a new document, with an expiration time 
        #   after which the document will be deleted
        addIfNotExist(key, doc, timeToLive)
        
        # Delete a document
        delete(key)
        
        # When the value is an integer, increment the integer
        increment(key)
        
        # When the value is an integer, decrement the integer
        decrement(key)
        
        # When the value is an opaque byte array, append more 
        #  data into existing value 
        append(key, newData)
        
        # Query the data 
        results = query(viewName, queryParameters)
  

Couchbase Java SDK

The code snippet below shows how the Java SDK may be used for some common operations:

 import com.couchbase.client.java.*;
        import com.couchbase.client.java.document.*;
        import com.couchbase.client.java.document.json.*;
        import com.couchbase.client.java.query.*;
        
        public class Example {
        
            public static void main(String... args) throws Exception {
        
                // Initialize the Connection
                Cluster cluster = CouchbaseCluster.create("localhost");
                cluster.authenticate("username", "password");
                Bucket bucket = cluster.openBucket("bucketname");
        
                // Create a JSON Document
                JsonObject arthur = JsonObject.create()
                    .put("name", "Arthur")
                    .put("email", "[email protected]")
                    .put("interests", JsonArray.from("Holy Grail", "African Swallows"));
        
                // Store the Document
                bucket.upsert(JsonDocument.create("u:king_arthur", arthur));
        
                // Load the Document and print it
                // Prints Content and Metadata of the stored Document
                System.out.println(bucket.get("u:king_arthur"));
        
                // Create a N1QL Primary Index (but ignore if it exists)
                bucket.bucketManager().createN1qlPrimaryIndex(true, false);
        
                // Perform a N1QL Query
                N1qlQueryResult result = bucket.query(
                    N1qlQuery.parameterized("SELECT name FROM `bucketname` WHERE $1 IN interests",
                    JsonArray.from("African Swallows"))
                );
        
                // Print each found Row
                for (N1qlQueryRow row : result) {
                    // Prints {"name":"Arthur"}
                    System.out.println(row);
                }
            }
        }
  

Spring Data Couchbase

The Spring Data Couchbase project provides integration with the Couchbase Server database. Key functional areas of Spring Data Couchbase are a POJO centric model for interacting with Couchbase Buckets and easily writing a Repository style data access layer.

1. Data Model

First create an entity class representing the JSON document to persist.

 @Document
        public class Person {
            @Id
            private String id;
             
            @Field
            @NotNull
            private String firstName;
             
            @Field
            @NotNull
            private String lastName;
             
            @Field
            @NotNull
            private DateTime created;
             
            @Field
            private DateTime updated;
             
            // standard getters and setters
        }
  

2. Couchbase Repository

We declare a repository interface for the Person class by extending CrudRepository<String,Person> and adding a derivable query method:

 public interface PersonRepository extends CrudRepository<Person, String> {
            List findByFirstName(String firstName);
        }
  

3. Service Layer

For our service layer, we define an interface and an implementation using the Spring Data repository abstraction. Here is our PersonService interface:

 public interface PersonService {
            Person findOne(String id);
            List findAll();
            List findByFirstName(String firstName);
             
            void create(Person person);
            void update(Person person);
            void delete(Person person);
        }
        
  

4. Service Implementation
 @Service
        @Qualifier("PersonRepositoryService")
        public class PersonRepositoryService implements PersonService {
             
            @Autowired
            private PersonRepository repo; 
         
            public Person findOne(String id) {
                return repo.findOne(id);
            }
         
            public List findAll() {
                List people = new ArrayList();
                Iterator it = repo.findAll().iterator();
                while(it.hasNext()) {
                    people.add(it.next());
                }
                return people;
            }
         
            public List findByFirstName(String firstName) {
                return repo.findByFirstName(firstName);
            }
         
            public void create(Person person) {
                person.setCreated(DateTime.now());
                repo.save(person);
            }
         
            public void update(Person person) {
                person.setUpdated(DateTime.now());
                repo.save(person);
            }
         
            public void delete(Person person) {
                repo.delete(person);
            }
        }
        
  

Spring Boot

Spring Boot is an open source Java-based framework used to create Micro Services. It is used to build stand-alone and production ready spring applications.

What is Micro Service?

Micro Service is an architecture that allows the developers to develop and deploy services independently. Each service running has its own process, and this achieves the lightweight model to support business applications.

Features and benefits of Spring Boot

  • Spring boot provides a flexible way to configure Java Beans, XML configurations, and Database Transactions.
  • It provides a powerful batch processing and manages REST endpoints.
  • In Spring Boot, everything is auto configured; no manual configurations are needed.
  • It offers annotation-based spring application.
  • Eases dependency management.
  • It includes Embedded Servlet Container.
  • It is highly dependent on the starter templates feature.

How Spring Boot works

Spring Boot automatically configures our application based on the dependencies we have added to the project by using @EnableAutoConfiguration annotation. For example, if MySQL database is on our classpath, but we have not configured any database connection, then Spring Boot auto-configures an in-memory database.

Spring Boot Starters

Handling dependency management is a difficult task for big projects. Spring Boot resolves this problem by providing a set of dependencies for developer’s convenience.

For example, if we want to create a web application with REST Endpoints, it is sufficient if we include spring-boot-starter-web dependency in our project.

Note that all Spring Boot starters follow the same naming pattern spring-boot-starter-*, where * indicates that it is a type of the application.

Example:

Spring Boot Starter Test dependency is used for writing Test cases. Its code is shown below:

                  org.springframework.boot
          spring-boot-starter-test
                
        

Spring Boot Application

The entry point of the Spring Boot Application is the class containing @SpringBootApplication annotation. This class should have the main method to run the Spring Boot application. @SpringBootApplication annotation includes @EnableAutoConfiguration, @ComponentScan, and @SpringBootConfiguration annotations.

Spring Boot automatically scans all the components included in the project by using @ComponentScan annotation.

Observe the following code for a better understanding:

                
        import org.springframework.boot.SpringApplication;
        import org.springframework.boot.autoconfigure.EnableAutoConfiguration;
        
        @SpringBootApplication
        public class DemoApplication {
        public static void main(String[] args) {
        SpringApplication.run(DemoApplication.class, args);
        }
                
        

Spring Boot – Quick Start – using Groovy

The Spring Boot CLI is a command line tool and it allows us to run the Groovy scripts. Create a simple groovy file which contains the Rest Endpoint script.

Hello.groovy

                
        @Controller
        class Example {
        @RequestMapping("/")
        @ResponseBody
        public String hello() {
        "Hello Spring Boot"
        }
        }
                
        

The above file can be run using the command “spring run Hello.groovy”

spring_command_img

Once we run the groovy file, required dependencies will download automatically and it will start the application in Tomcat 8080 port as shown in the screenshot above. You can also see that sping ‘Mapped “{[/]}” onto public java.lang.String Example.hello()’.

We can go to the web browser and hit the URL http://localhost:8080/, and see the output from hello() function as shown below:

hello_spring_img

Click here to continue >>

1. Security and Compliance:

If you are wondering why we are starting with security, then check out this number. $6 trillion, that’s the amount of annual damage cyber crimes is predicted to cost us by 2021.

Which is precisely why the first thing you need to check while picking your cloud service provider is their security and compliance levels – both physical as well as virtual – this includes the geographical location of their data centers and the local laws of the country they are based in.

There are a number of certifications and standards which guarantee the security preparedness of cloud vendors; their validity must be checked and additional investigations must be carried out by checking internal and third-party audits or reports.

You need to do a deep check of:

  • Security infrastructure and procedures followed by the vendor
  • Identity management and authorizations
  • Physical security controls including the process for natural disasters
  • Policies for data back-up and disaster recovery

2. Technical Capabilities

An obvious point, but it still needs to be reiterated.

Your service provider should have a full stack of technologies that support your current applications and also has the capability to match your future needs.

Cloud partnerships last a long time, and it’s important to check the future roadmap of the service provider to understand if they have the mindset to catch trends early and innovate.

Some questions to focus on:

  • Will your current software and applications integrate easily with the service provider’s cloud infrastructure?
  • Do they use standard interfaces and APIs for easy integration?
  • Do they have the capability of providing hybrid cloud computing options and do they have the flexibility to host different cloud environments and systems?
  • Are they backing their capabilities with SLAs?
  • Are they willing and capable to architect solutions tailored to your business?

3. Costs

No two cloud service providers have similar or comparable pricing packages. They each have their own formula of computing cloud costs, and it is almost impossible to make a side-by-side comparison of different vendors. What you need to do is map out your organization’s requirement as minutely as possible and then decide which pricing model suits your needs.

Keep in mind:

  • Consumption timelines as long-term contracts are better priced
  • The flexibility offered by service providers in scaling up or down
  • Check for hidden costs

4. Business Health

The stability of your business depends on the stability of its partners, and you cannot underestimate the importance of a cloud partner. Before finalizing your cloud vendor, it is important to check their business and financial health.

You should check:

  • The company’s financial records
  • Management structure and other third-party relationships
  • Reputation, reviews, and referrals from existing customers
  • For any legal run-ins
  • All available third-party audits

5. Support

Do you just have a phone or chat access or does your service provider offers dedicated account management? How much support you can get from your vendor is another important criteria, that must be considered before finalizing a service provider.

Find out about:

  • Time guarantees for solving technical issues
  • Access to support services – 24×7 or 12×5
  • Cost of opting for dedicated resources

Deciding on a cloud service provider is a long process that demands complete thoroughness and analysis from the CIO and the rest of the team.

Before we leave you to navigate your way to your future cloud partner, here are two more important points that must be considered – Right size and exit strategy.

Keep in mind that to get the best service you need to find a vendor who connects with you and for whom you are a valuable client.

And always plan an exit strategy in case things don’t work out.

Best of Luck!

When you are deploying a new change into production, the associated deployment should be in a predictable manner. In simple terms, this means no disruption and zero downtime! In case you do encounter a problem or a bottleneck, the deployment strategy should include a quick roll back.

The safe strategy can be achieved by working with two identical infrastructures – the “green” environment hosting the current production and the “blue” environment with the new changes.

The business and IT teams will have an opportunity to conduct sanity, smoke test or any other test in the “blue” environment before making a “Go” decision. Upon “Go”, the team can switch “blue” to “green” and “green” to “blue”.

In Azure, different processes are available for implementing the Blue-Green strategy with two environments.

We have listed below some of these techniques. Naturally, this list is not fixed and will grow continuously as new tool sets and services emerge.

  • Deployment slots – For Web Apps, deployment slots provide an easy way to implement Blue-Green deployments.
  • Azure Traffic Manager – This can be leveraged to realize Blue-Green deployments for smoother deployments with weighted round-robin routing method. The detailed configuration and implementation methods are available in Azure Documentation.
  • Using an Application Gateway with two backend pools and a routing rule – Have two backend resource pools with one as a stage pool and another one as a prod pool. Add stage VMSS to stage pool, prod VMSS to prod pool and have one routing rule in the app gateway. Depending on the need to use stage or prod VMSS, this rule will be changed to point to the appropriate backend address pool.

CloudIQ architects and engineers have implemented Blue-Green deployment for multiple clients, and in each case, we have customized our strategies to suit their use-cases. If you are looking for a completely safe way to deploy new software versions and applications, then reach out to us at [email protected]

There are several open source tools available to manage infrastructure as code that are backed by large communities of contributors with enterprise offering and good documentation. Why do we choose Terraform and what makes it unique / stand out? Terraform is used to provision an infrastructure and manage the infrastructure changes by versioning. It can manage components such as compute instances, storage, and networking, as well as high-level components such as DNS entries etc.

Good Fit for Cloud Agnostics Strategy

Enterprises would be interested to mitigate the availability risk of mission critical system in cloud by spreading their services across multiple cloud providers. Also, enterprises would always look for avenues to reduce their infrastructure cost by moving away from vendor locking situation. Terraform comes as savior for these use cases by being cloud-agnostic and allows a single configuration to be used to manage multiple providers, and to even handle cross-cloud dependencies by simplifying management and orchestration.

An Orchestration Tool

Chef, Puppet, Ansible, and SaltStack are all “configuration management” tools that are designed to install and manage software on existing servers whereas Terraform is an “orchestration tool” that is designed to provision the servers and leaving the configuring job to other tools. While there might be certain overlapping features between orchestration and configuration tools, each tool is going to be a better fit for certain use case. For example, when an infrastructure is dominated by Containers, all you need to do is provision a bunch of servers, then an orchestration tool like Terraform is typically going to be a better fit than a configuration management tool

Combat Configuration Drift

While configuration tools are best known to combat the configuration drift in the infrastructure, they are mostly used to manage a subset of machine’s state that will lead to some gap in the infrastructure state. The management will see diminishing returns to close those gaps against the matter that needs the most for daily operations. This set of issues can be mitigated with Terraform along with Containers.

For example, if you tell configuration tool to install a new version of OpenSSL, it’ll run the software update on your existing servers and the changes will happen in-place. Over time, as you apply more and more updates, each server builds up a unique history of changes, causing configuration drift. If you’re using Docker and an orchestration tool such as Terraform, the docker image is already built ready for the new servers. A new server will be deployed and then uninstall the old servers. All the server states will be maintained by Terraform. This approach reduces the likelihood of configuration drift bugs.

Conclusion

Overall, Terraform is an open source and cloud-agnostic orchestration tool with salient features. While it might be a less mature tool compared to other tools in the market, Terraform is still a good candidate to meet a specific set of requirements.

Apache Spark is an open-source parallel processing framework for running large-scale data analytics applications across clustered computers. It can handle both batch and real-time analytics and data processing workloads.

Spark provides distributed task transmission, scheduling, and I/O functionality. It provides programmers with a potentially faster and more flexible alternative to MapReduce, the software framework to which early versions of Hadoop were tied.

How Apache Spark works

Apache Spark can process data from a variety of data repositories, including the Hadoop Distributed File System (HDFS), NoSQL databases and relational data stores.

The Spark Core engine uses the resilient distributed data set, or RDD, as its primary data type. The RDD is designed in such a way to hide much of the computational complexity from users. It aggregates data and partitions it across a server cluster, where it can then be computed and either moved to a different data store or run through an analytic model. The user doesn’t have to define where specific files are sent or what computational resources are used to store or retrieve files.

Given below is a sample Spark program written in Python to count the number of records with each rating in the input file given in next page:

 from pyspark import SparkConf, SparkContext
        import collections
        
        conf = SparkConf().setMaster("local").setAppName("RatingsHistogram")
        sc = SparkContext(conf = conf)
        
        lines = sc.textFile("file:///SparkCourse/ml-100k/u.data")
        ratings = lines.map(lambda x: x.split()[2])
        result = ratings.countByValue()
        
        sortedResults = collections.OrderedDict(sorted(result.items()))
        for key, value in sortedResults.items():
            print("%s %i" % (key, value))
        
         

In the above code, sc is the SparkContext associated with input file u.data. ratings is a RDD created by mapping the 3rd column in input file (array occurrence [2] – Ratings). Here map() is a transformation function which produces a new RDD.

We can have multiple transformations in a single spark program each producing a new RDD from an existing RDD or an input file. countByValue() is an Action that is performed.
In Spark, the transformations are not executed until an Action is triggered. This is called Lazy Evaluation.

Apache Spark works

Figure 1

Spark languages

Spark was written in Scala, which is considered the primary language for interacting with the Spark Core engine. Out of the box, Spark also comes with API connectors for using Java, R, and Python.

Spark libraries

  • The Spark Core engine functions partly as an application programming interface (API) layer and underpins a set of related tools for managing and analyzing data.
  • Spark SQL — One of the most commonly used libraries, Spark SQL enables users to query data stored in disparate applications using the common SQL language.
  • Spark Streaming — This library allows users to build applications that analyze and present data in real time.
  • MLlib — A library of machine learning code that enables users to apply advanced statistical operations to data in their Spark cluster and to build applications around these analyses.
  • GraphX — A built-in library of algorithms for graph-parallel computation.

RDDs, DataFrames, and Datasets

An RDD is an immutable distributed collection of elements of data, partitioned across nodes in a cluster that can be operated in parallel with a low-level API that offers transformations and actions.

Like an RDD, a DataFrame is an immutable distributed collection of data. However, unlike an RDD, data is organized into named columns, like a table in a relational database.

Datasets in Apache Spark are an extension of DataFrame API which provides type-safe, object-oriented programming interface.

Executing SQL-style functions on a Dataframe

Given below is a map-reduce program to get the list of popular movies (which has been rated by many customers using the same input data as Figure 1 above).

 from pyspark import SparkConf, SparkContext
        
        conf = SparkConf().setMaster("local").setAppName("PopularMovies")
        sc = SparkContext(conf = conf)
        
        lines = sc.textFile("file:///SparkCourse/ml-100k/u.data")
        movies = lines.map(lambda x: (int(x.split()[1]), 1))
        movieCounts = movies.reduceByKey(lambda x, y: x + y)
        
        flipped = movieCounts.map( lambda xy: (xy[1],xy[0]) )
        sortedMovies = flipped.sortByKey()
        
        results = sortedMovies.collect()
        
        for result in results:
            print(result)
        
         

The same program, when written using DataFrames, will look like this

 from pyspark.sql import SparkSession
        from pyspark.sql import Row
        from pyspark.sql import functions
        
        def loadMovieNames():
            movieNames = {}
            with open("ml-100k/u.ITEM") as f:
                for line in f:
                    fields = line.split('|')
                    movieNames[int(fields[0])] = fields[1]
            return movieNames
        
        # Create a SparkSession (the config bit is only for Windows!)
        spark = SparkSession.builder.config("spark.sql.warehouse.dir", 
        "file:///C:/temp").appName("PopularMovies").getOrCreate()
        # Load up our movie ID -> name dictionary
        nameDict = loadMovieNames()
        
        # Get the raw data
        lines = spark.sparkContext.textFile("file:///SparkCourse/ml-100k/u.data")
        # Convert it to a RDD of Row objects
        movies = lines.map(lambda x: Row(movieID =int(x.split()[1])))
        # Convert that to a DataFrame
        movieDataset = spark.createDataFrame(movies)
        
        # Some SQL-Style magic to sort all movies by popularity in one line!
        topMovieIDs = movieDataset.groupBy("movieID").count().orderBy
        ("count", ascending=False).cache()
        
        # Show the results at this point:
        
        #|movieID|count|
        #+-------+-----+
        #|     50|  584|
        #|    258|  509|
        #|    100|  508|
        
        topMovieIDs.show()
        
        # Grab the top 10
        top10 = topMovieIDs.take(10)
        
        # Print the results
        print("\n")
        for result in top10:
            # Each row has movieID, count as above.
            print("%s: %d" % (nameDict[result[0]], result[1]))
        
        # Stop the session
        spark.stop()
        
         

As you can see DataFrames gives us the flexibility to use SQL style functions to get the required results. Because DataFrames APIs are built on top of the Spark SQL engine, it uses Catalyst to generate an optimized logical and physical query plan.

Job Scheduling

Spark has several facilities for scheduling resources between computations.

  • Each Spark application (instance of SparkContext) runs an independent set of executor processes. The cluster managers that Spark runs on provide facilities for scheduling across applications.
  • Within each Spark application, multiple “jobs” (Spark actions) may be running concurrently if they were submitted by different threads. This is common if the application is serving requests over the network. Spark includes a fair scheduler to schedule resources within each SparkContext.

Spark Streaming

Spark Streaming is an extension of the core Spark API that enables scalable, high-throughput, fault-tolerant stream processing of live data streams. Data can be ingested from many sources like Kafka, Flume, Kinesis, or TCP sockets, and can be processed using complex algorithms expressed with high-level functions like map, reduce, join and window. Finally, processed data can be pushed out to filesystems, databases, and live dashboards.

Spark Streaming

The Python program shown below counts the number of words in text data received from a data server listening on a TCP socket.

Sample input entered for this program at a terminal through NetCat and the output of the program is given below.

 
     
# TERMINAL 1:
# Running Netcat

$ nc -lk 9999

hello world
...

         
 
     
# TERMINAL 2: RUNNING network_wordcount.py
        
$ ./bin/spark-submit examples/src/main/python/streaming/network_wordcount.py 
localhost 9999
...
-------------------------------------------
Time: 2014-10-14 15:25:21
-------------------------------------------
(hello,1)
(world,1)
...

         

Conclusion

Launched for the first time in May 2014, Apache Spark has become the go-to program for companies that work with large-scale Big Data applications. The speed and agility of Spark have made it incredibly useful across a wide range of industries.

From FMCG giants to BFSI companies to digital advertising firms – Apache Spark has proved to be indispensable when it comes to aggregating data, gleaning insights and forecasting industry trends.

Requirements:-
  1. First step in getting up and running is to install VirtualBox. You can get appropriate version from the www.virtualbox.org
  2. Need to install vagrant. The same procedure is applies; grab the installer from ww.vagrantup.com.

We can start the cluster setup, so we need the vagrant file for cluster setup using that only we can set it out.

Or Else clone the below git repository for getting sample vagrant file

https://github.com/coreos/coreos-vagrant

Now that every thing is downloaded, we can look at how to configure vagrant for your CoreOS development environment

  1. Make copies and rename the configuration files: copy-user-data to user-data, and copy and rename config.rb.sample to config.rb
  2. Open confi.rb so that you can change the a few parameters to get vagrant up and running properly.
     # Size of the CoreOS cluster created by Vagrant
            $num_instances=2
             
  3. You may also want to tweak some other settings in config.rb. CPU, Memory settings can be modified as per your need.
     #Customize VMs
            $vm_gui = false
            $vm_memory = 1024
            $vm_cpus = 1
            $vb_cpuexecutioncap = 100
             

And then open the git shell to interact with vagrant

Go to your current working directory in your shell and issue this command

 vagrant up
         

You will see the things happening, which will look like this ,

Once the operation is completed you can verify that everything is up and running properly by logging in to one of the machines and using fleetctl to check the cluster

 vagrant ssh core-01
        fleetctl list-machines
         

If you see list of machines you created then you are finished, you now have a local cluster of CoreOS machines.

This is the fifth blog in our series helping you understand all about cloud, when you are in a dilemma to choose Azure or AWS or both, if needed.

Before we jumpstart on the actual comparison chart of Azure and AWS, we would like to bring you some basics on data analytics and the current trends on the subject.

If you would rather like to have quick look at the comparison table, Click here

This blog is intended to help you strategize your data analytics initiatives so that you can make the most informed decision possible by analyzing all the data you need in real time. Furthermore, we also will help you draw comparisons between Azure and AWS, the two leaders in cloud, and their capabilities in Big Data and Analytics as published in a handout released by Microsoft.

Beyond doubts, this is an era of data. Every touch point of your business generates volumes of data and these data cannot be simply whisked away, cast aside as valuable business insights can be unearthed with a little effort. Here’s where your Data Analytics infrastructure helps.

A 2017 Planning Guide for Data and Analytics published by Gartner written by the Analyst John Hagerty states that

The Key Findings as per the report are as follows:

  • Data and analytics must drive modern business operations, not just reflect them. Technical professionals must holistically manage an end-to-end data and analytics architecture to acquire, organize, analyze and deliver insights to support that goal.
  • Analytics are now infused in places where they never existed before.
  • Executives will seek strategies to better manage and monetize data for internal and external business ecosystems.
  • Data gravity is rapidly shifting to the cloud, with IoT, data providers and cloud-native applications leading the way. It is no longer a question of “if” for using cloud for data and analytics; it’s “how.”

The last point emphasizes on how cloud is playing a prominent role when it comes to Data Analytics and if you have thoughts on who and how, Gartner in its latest magic quadrant has said that AWS and Azure are the top leaders. Now, if you are in doubt whether to go the Azure way or AWS or should it be the both, here’s the comparison table showing their respective Big Data and Analytics Capabilities

 

ServiceDescriptionAWSAzure
Elastic data warehouseA fully managed data warehouse that analyzes data using business intelligence tools.RedshiftSQL Data Warehouse
Big data processingSupports technologies that break up large data processing tasks into multiple jobs, and then combine the results to enable massive parallelism.Elastic MapReduce (EMR)HDInsight
Data orchestrationProcesses and moves data between different compute and storage services, as well as on-premises data sources at specifed intervals.Data PipelineData Factory
Cloud-based ETL/data integration service that orchestrates and automates the movement and transformation of data from various sources.AWS Glue Data CatalogData Factory + Data Catalog
AnalyticsStorage and analysis platforms that create insights from massive quantities of data, or data that originates from many sources.Kinesis AnalyticsStream Analytics

Data Lake Analytics

Data Lake Store
Streaming dataAllow mass ingestion of small data inputs, typically from devices and sensors, to process and route data.Kinesis Streams

Kinesis Firehose
Event Hubs

Event Hubs Capture
Visualizationperform ad-hoc analysis, and develop business insights from data.QuickSight (Preview)Power BI
Allows visualization and data analysis tools to be embedded in applications.Power BI Embedded
SearchA scalable search server based on Apache Lucene.Elasticsearch ServiceMarketplace—Elasticsearch
Delivers full-text search and related search analytics and capabilities.CloudSearchSearch
Machine learningProduces an end-to-end workfow to create, process, refne, and publish predictive models from complex data sets.Machine LearningMachine Learning
Data discoveryProvides the ability to better register, enrich, discover, understand, and consume data sources.Data Catalog
A serverless interactive query service that uses standard SQL for analyzing databases.Amazon AthenaData Lake Analytics

Click here to read the entire guide published by Microsoft Azure Team:

This is our fourth blog in the series of blogs intended to help you embark on a cloud strategy, most importantly when you are in dilemma to choose AWS or Azure, the two prominent cloud players today.

If you had missed our earlier blogs, click here

1st Blog – Compute

2nd Blog- Storage

3rd Blog- CDN & Networking

Before we jumpstart on the actual comparison chart of Azure and AWS, we would like to bring you some basics on the database aspect of cloud strategy.

If you would rather like to have quick look at the database comparison table, click here

Through this blog, let’s understand the database aspect of your cloud strategy. As per the guide, Database services refers to options for storing data, whether it’s a managed relational SQL database that’s globally distributed or a multi-model NoSQL database designed for any scale.

When you decide cloud, one of the critical decisions you face is which database to use – SQL or NoSQL. Though SQL has an impressive track record, NoSQL is not far behind as it is gradually making notable gains and has many proponents. Once you have picked your database, the other big decision to make is which cloud vendor to choose amongst the many vendors.

Here’s where you consider Gartner’s prediction; the research company published a document that states

“Public cloud services, such as Amazon Web Services (AWS), Microsoft Azure and IBM Cloud, are innovation juggernauts that offer highly operating-cost-competitive alternatives to traditional, on-premises hosting environments.

Cloud databases are now essential for emerging digital business use cases, next-generation applications and initiatives such as IoT. Gartner recommends that enterprises make cloud databases the preferred deployment model for all new business processes, workloads, and applications. As such, architects and tech professionals should start building a cloud-first data strategy now, if they haven’t done so already”

Reinstating the trend, recently Gartner has published a new magic quadrant for infrastructure-as-a-service (IaaS) that – surprising nobody – has Amazon Web Services and Microsoft alone in the leader’s quadrant and a few others thought outside of the box.

 

Now, the question really is, Azure or AWS for your cloud data? Or should it be both? Here’s a quick comparison table to guide you.

ServiceDescriptionAWSAzure
Relational databaseSQL Database is a high-performance, reliable, and secure database you can use to build data-driven applications and websites, without needing to manage infrastructure.RDSSQL Database including Postgres and MySQL
NoSQL—document storageA globally-distributed, multi-model database that natively supports multiple data models: key-value, documents, graphs, and columnar.DynamoDBCosmos DB
NoSQL—key/value storageA non-relational data store for semi-structured data.DynamoDB and SimpleDBTable Storage
CachingAn in-memory–based, distributed-caching service that provides a high-performance store typically used to offoad non-transactional work from a database.ElastiCacheRedis Cache
Database migrationFocuses on migration of database schema and data from one database format to a specifc database technology in the cloud.Database Migration Service (Preview)SQL Database Migration Wizard

Click here to read the entire guide published by Microsoft Azure Team:

In line with our latest blog series highlighting how common cloud services are made available via Azure and Amazon Web Services (AWS), as published by Microsoft, this third blog in the series helps you understand Cloud Networking and Content Delivery capabilities of both Azure and AWS.

Before we jumpstart on the actual comparison chart of Azure and AWS, we would like to bring you some basics on cloud content delivery networking and the current trends on the subject.

If you would rather like to have quick look at the comparison table, click here

When we talk about cloud Content Delivery Network (CDN) and the related networking capabilities it includes all the hardware and software that allows you to easily provision private networks, connect your cloud application to your on-premises datacenters, and more.

According to Gartner, Content delivery networks (CDNs) are a type of distributed computing infrastructure, where devices (servers or appliances) reside in multiple points of presence on multi-hop packet-routing networks, such as the Internet, or on private WANs. A CDN can be used to distribute rich media downloads or streams, deliver software packages and updates, and provide services such as global load balancing, Secure Sockets Layer acceleration and dynamic application acceleration via WAN optimization techniques.

In simpler terms, this highly distributed server platforms are optimized to deliver content in a way that improves customer experience. Hence, it is important to decrease latency by keeping the data closer to the users, protect it from security threats while ensuring rapid streamlined content delivery including general web delivery, content purge, content caching and tracking history as long as 90 days.

As per G2Crowd.com, most organizations use CDN services, such as web caching, request routing, and server-load balancing, to reduce load times and improve website performance. Further to qualify as a CDN provider, a service provider must:

  • Allow access to a geographically dispersed network of PoPs in multiple data centers
  • Help websites access this network to deliver content to website visitors
  • Offer services designed to improve website performance
  • Provide scalable Internet bandwidth allowances according to customer needs
  • Maintain data center(s) of servers to reduce the possibility of overloading individual instances

With this background, let’s look at the AWS vs Azure comparison chart in terms of Networking and Content Delivery Capabilities:

ServiceDescriptionAWSAzure
Cloud virtual networkingProvides an isolated, private environment in the cloud.Virtual Private CloudVirtual Network
Cross-premises connectivityConnects Azure virtual networks to other Azure virtual networks or customer on-premises networks. It also supports VPN tunneling.AWS VPN GatewayVPN Gateway
Domain name system managementManage DNS records using the same credentials, billing, and support contract as other Azure services.

Service that hosts domain names, routes users to Internet applications, manages traf c to apps, and improves app availability with automatic failover.
Route 53




Route 53
DNS




Traffic Manager
Content delivery networkGlobal content delivery network that transfers audio, video, applications, images, and other les.CloudFrontContent Delivery Network
Dedicated networkEstablishes a dedicated, private network connection from a location to the cloud provider.Direct ConnectExpressRoute
Load balancingAutomatically distributes incoming application traf c to add scale, handle failover, and route to a collection of resources.Elastic Load BalancingLoad Balancer

Application Gateway

To read more about the Microsoft guide which briefs all about cloud by drawing comparisons between Azure or AWS, click here (link to PDF download)

You may also like to read our previous blogs in these series, if so, please click here:

https://www.cloudiqtech.com/azure-vs-aws-compute/
https://www.cloudiqtech.com/aws-vs-azure-cloud-storage/

Azure or AWS or Azure & AWS? What’s your cloud strategy for Storage?

This is our second blog, in our latest blog series helping you understand all about cloud, especially when you are in doubt whether to go Azure or AWS or both.

To read our first blog talking about Cloud strategy in general and Compute in particular, click here…

Moving on, in this blog let’s find what Azure or AWS offer when it comes to Storage Capabilities for your Cloud Infrastructure.

Globally CIOs are increasingly looking to cease running their own data centers and move to cloud which is evident when we read the projection made by a leading researcher, MarketsandMarkets. They had reported that the global cloud storage business sector to grow from $18.87 billion in 2015 to $65.41 billion by 2020, at a compound annual growth rate (CAGR) of 28.2 percent during the forecast period.

Reinstating the fact, 451 Research’s Voice of the Enterprise survey last year stated that Public cloud storage spending will double by next year (2017). “IT managers are recognizing the need for storage transformation to meet the realities of the new digital economy, especially in terms of improved efficiency and agility in the face of relentless data growth,” said Simon Robinson, research vice president at 451 and research director of the new Voice of the Enterprise: Storage service. “It’s clear from our Q4 study that emerging options, especially public cloud storage and all-flash array technologies, will be increasingly important components in this transformation” he added further.

As we see, many companies are in for Cloud Storage, undoubtedly. But the big question – Whom to choose from a gamut of leading public cloud players including big players like AZURE, AWS; Should it be AZURE alone for your cloud storage or AWS or a combination of both still prevails.

This needs a thorough understanding. To help you decide for good, we have decided to re-produce a guide, published by Microsoft that briefs Azure‘s capabilities in comparison to AWS when it comes to Cloud Strategy. And we will see the Storage part in this blog, but before, that a little backgrounder on Cloud Storage.

When we talk about cloud storage device mechanisms, we include all logical units of data storage covering from files, blocks, and datasets to objects and their relative storage interfaces. These instances of virtual storage devices are designed specifically for cloud-based provisioning and can be scaled as per need. It is to be noted that different cloud service consumers utilize different technologies to interface with virtualized cloud storage devices.

ServiceDescriptionAWSAzure
Object storageObject storage service for use cases including cloud apps,
content distribution, backup, archiving, disaster recovery,
and big data analytics.
Simple Storage Services (S3) Storage—Block Blob (for content logs, files) (Standard—Hot)
Virtual Server disk
infrastructure
SSD storage optimized for I/O intensive
read/write operations.
Elastic Block Store (EBS)Disk Storage—Page Blobs (for VHDs or other random-write type data)

Disk Storage—Premium Storage
Shared file storageA simple interface to create and configure file
systems quickly as well as share common files.
Elastic File SystemFile Storage (file share between VMs)
Archiving—cool storageA lower cost tier for storing data that is
infrequently accessed and long-lived.
S3 IA GlacierStorage—Hot, Cool & Archive Tier
BackupBackup and archival solutions that allow files and folders
to be backed-up and recovered from the cloud, and
provide off-site protection against data loss.
Backup and RecoveryBackup
Hybrid storageIntegrates on-premises IT environments with cloud
storage. Automates data management and storage, plus
supports disaster recovery.
Storage GatewayStorSimple
Bulk data transferA data transport solution that uses secure disks and
appliances to transfer substantial amounts of data.

Petabyte- to Exabyte-scale data transport solution.
AWS Import/Export Disk




AWS Import/Export Snowball

AWS Snowball Edge

AWS Snowmobile
Import/Export



Data Box
Disaster recoveryAutomates protection and replication of virtual
machines with health monitoring, recovery plans,
and recovery plan testing.
Site Recovery

For a more detailed understanding download the document here

Surprisingly, as per an article published by Gartner, “Cloud Computing is still perplexing to many CIOs even after a decade of cloud’. While cloud computing is a foundation for digital business, Gartner estimates that less than one-third of enterprises have a documented cloud strategy. This indeed comes as a surprise given the fact that cloud has evolved from a disruption to the indispensable tech of today and tomorrow, all along strategically adopted by many progressive companies.

In the same article Donna Scott, Vice President and distinguished analyst at Gartner states that “Cloud computing will become the dominant design style for new applications and for refactoring a large number of existing applications over the next 10-plus years”. She also added that “A cloud strategy clearly defines the business outcomes you seek, and how you are going to get there. Having a cloud strategy will enable you to apply its tenets quickly with fewer delays, thus speeding the arrival of your ultimate business outcomes.”

However, it is easier said than done. Many top businesses still have questions like how to make the most from cloud computing? What kind of architectures and techniques need to be strategized to support the many flavors of evolving cloud computing? Private or Public? Hybrid or Public? Azure or AWS, or it should be a hybrid combo?

Through a series of blogs we intent to bring answers to these questions. As a first one, we would like to highlight and represent a comparative cloud service map focusing on both Azure and AWS both leaders in public cloud platforms, as published by Microsoft.

The well-researched article draws detailed comparisons between Azure and AWS and how common cloud services across parameters such as Marketplace, Compute, Storage, Networking, Database, Analytics, Big Data, Intelligence, IOT, Mobile and Enterprise Integration are made available via Azure and Amazon Web Services (AWS)

It should be noted that as prominent public cloud platforms providers, Azure and AWS each offer businesses a wide and comprehensive capabilities across the globe. Many organizations have chosen either one of them or both depending upon their needs in order to gain more agility, and flexibility while minimizing the risk and maximizing the larger benefits of a multi-cloud environment.

For starters, let’s start with COMPUTE and the points one should consider and compare before deciding the Azure or AWS approach or a combination of both.

ServiceDescriptionAWSAzure
Virtual serversAllows users to deploy, manage, and maintain
OS and server software; instance types provide
configurations of CPU/RAM.

Offers a lightweight, simplified product offering users can
choose from from when building out a virtual machine.
Elastic Compute Cloud (EC2)
VMs




Amazon Lightsail
Virtual Machines





Virtual Machine Images
Container managementSupports Docker containers and allows users to run
applications on managed instance clusters.

Allows customers to store Docker formatted images. Used
to create all types of container deployments on Azure.
EC2 Container Service (ECS)




EC2 Container Registry
Container Service




Container Registry
Microservice-based
applications
Orchestrates and manages the execution, lifetime, and
resilience of complex, interrelated code components
that can be either stateless or stateful.
Service Fabric
Backend process logicIntegrates systems and runs backend processes
in response to events or schedules without
provisioning or managing servers.
LambdaFunctions


Event Grid
Job orchestrationWhen processing across hundreds or thousands
of compute nodes, this tool orchestrates the
tasks and interactions between compute
resources that are necessary.
AWS BatchBatch
ScalabilityAutomatically changes the number of instances
providing a compute workload. Users set defined
metrics and thresholds that determine if the platform
adds or removes instances.
AWS Auto ScalingVirtual Machine Scale Sets

App Service Scale Capability (PAAS)

AutoScaling
Pre-defined templatesCommunity-led templates for creating and
deploying virtual machine-based solutions.
AWS Quick StartQuickstart templates

For a more detailed understanding download the document here

CloudIQ is a leading Cloud Consulting and Solutions firm that helps businesses solve today’s problems and plan the enterprise of tomorrow by integrating intelligent cloud solutions. We help you leverage the technologies that make your people more productive, your infrastructure more intelligent, and your business more profitable. 

US

626 120th Ave NE, Suite B102, Bellevue,

WA, 98005.

INDIA

Chennai One IT SEZ,

Module No:5-C, Phase ll, 2nd Floor, North Block, Pallavaram-Thoraipakkam 200 ft road, Thoraipakkam, Chennai – 600097


© 2025 CloudIQ Technologies. All rights reserved.

Get in touch

Please contact us using the form below

    USA

    626 120th Ave NE, Suite B102, Bellevue, WA, 98005.

    +1 (206) 203-4151

    INDIA

    Chennai One IT SEZ,

    Module No:5-C, Phase ll, 2nd Floor, North Block, Pallavaram-Thoraipakkam 200 ft road, Thoraipakkam, Chennai – 600097

    +91-044-43548317

    Get in touch

    Please contact us using the form below

      USA

      626 120th Ave NE, Suite B102, Bellevue, WA, 98005.

      +1 (206) 203-4151

      INDIA

      Chennai One IT SEZ,

      Module No:5-C, Phase ll, 2nd Floor, North Block, Pallavaram-Thoraipakkam 200 ft road, Thoraipakkam, Chennai – 600097

      +91-044-43548317