Daily C2C requirements: IT Skills

Java Technologies Latest Version: Java SE 17, Java EE 8 Spring/spring boot? Microservices?	Core Java, J2EE, Servlets, JSP, EJB, JMS, JNDI, JDBC, JTS Major Java frameworks: Spring, Apache Struts, Hibernate, Grails, JSF (JavaServer Faces), Wicket, GWT Design Pattern: Singleton, DAO, DTO, MVC, Front Controller, Factory Method Service-oriented architecture/web services: (SOAP/REST) IDE: Netbeans, Eclipse, Myeclipse, IntelliJ IDEA Servers: Apache Tomcat, Glassfish Server, JBoss Server , Weblogic Server Web Technology: HTML, CSS, Javascript, JQuery, AJAX Advanced JavaScript framework - Angular, React, VueJS, Node Design architectures: MVC, MVVM, MVVM-C, MVC-C, MVP
.Net Technologies Latest Version: .NET Framework 4.8 .NET Core 5	VB.NET, ASP.NET, ADO.NET, VB.net, VC++.NET, C#, MVC, COM, DCOM, Visual Studio ASP.NET is used as backend and C# & VB.NET are used for frontend development.
Front- end	Html, CSS, JavaScripts (Angular, React, Vue), JQuery, Ajax, C#, VB.NET
Back-end	C/C++, C#, PHP, Python, Ruby, Java, Sql, Perl JAVA (backend): J2EE, JSF, JSP, Webservices (Rest, SOAP UI), Struts, Spring, hibernate, servlets, Spring boot, Microservices, REST API, Express, Django, Rails, Laravel
CMS (Content management system)	Wordpress, Joomla, Drupal, Magento, Blogger, Shopify, Bitrix, Typo3, Squarespace, Pretashop, Dotnetnuke
Front-end Javascript	Angular, React, Vue, Ember, Polymer, backbone, Aurelia, Mithril, Webix
Back-end Javascript	Node.JS, Express.JS, Hapi, Feather.js, Meteor.js, Total.js, Next Node. js can be used in both the frontend and backend of applications.
JS testing Framework:	Jest, Mocha JS, Jasmine, Cypress
Databases:	SQL (Structured Query Language)/Relational Databases (RDBMS): MySQL, PostgreSQL, Oracle 12c, Sybase, MS SQL server (Microsoft SQL Server), Access, Ingres. NoSQL Databases: MongoDB, Cassandra, Redis, HBase, Neo4j, Oracle NoSQL, DynamoDB, Couchbase, Memcached, CouchDB
DevOps	Version Control tools: Git, SVN, Mercurial, CVS, and JIRA. Build tools: Ant, Maven, Gradle Automation Testing: Selenium, TestNG, JUnit Continuous Integration: Jenkins Containers: Docker, Kubernetes Monitoring: Splunk, ELK Stack, Nagios, NewRelic and Sensu
Cloud	Cloud - AWS, Azure, GCP IaaS (Infrastructure-as-a-Service), PaaS (Platform-as-a-Service), SaaS (Software-as-a-Service)
AWS Services	EC2, RDS, S3, Lambda, Cloud front, EBS, EFS, Athena, Cloudsearch, Elasticsearch, DynamoDB
Azure cloud services	Azure Active Directory (AD), Azure Content Delivery Network (CDN), Azure Data Factory Azure Machine Learning, Azure HDInsight Hadoop, Azure Data Lake Analytics, Azure SQL database, Azure Function, CosmosDB, Azure DevOps, Azure Backup, Logic Apps, Virtual Machine Azure Data Lake can be understood as a huge repository of data in its original form for Big data analytics. data storage and analytics service Output data from Azure Data Factory can be published on Azure Data Lake for Business Analytics (BI) applications for analytics and visualization. Azure Data Bricks: Databricks helps in data streaming and data collaboration in real-time, integrated with ADF, ML, Synapse, Power BI. Azure data Factory: Azure's cloud ETL service
Data warehouses	Data lakes and data warehouses are both widely used for storing big data. Popular cloud data warehouses like AWS DynamoDB, AWS Redshift
ETL (Extract, Transform, Load)	Informatica PowerCenter, Talend, IBM InfoSphere DataStage, Pentaho, AWS Glue, Azure Data Factory
Data Engineer/ Big Data Engineer:	Raw data gather, Data flow, Pipelines, ETL Tools: Python, SQL, PostgreSQL, MongoDB, Apache Spark, Apache Kafka, Amazon Redshift, Snowflake, Amazon Athena, Apache Airflow
Data Scientist/Data Analyst:	Data cleaning, Analytics, Metrics
Big Data/Hadoop	HDFS, Map Reduce, Pig, Hive, Sqoop, Oozie, Scala, Spark, Kafka, Flume, Ambari, Hue Hadoop Ecosystem: store and process data, HDFS, YARN, Scala, Map Reduce, Hive, Pig, Zookeeper, Sqoop, Oozie, Bedrock, Flume, Kafka, Impala, NiFi, MongoDB, HBase. Hadoop Platforms: Hortonworks, Cloudera, Azure, Amazon Web services (AWS). Hadoop Frameworks: Cloudera CDHs, Horton works HDPs
BI Business Intelligence	Tableau, PowerBI, Qlikview
Big data	Huge data
Hadoop	framework used for storing and processing big data
Kafka	used to build real-time streaming data pipelines/ stream processing
Spark	large-scale data processing, written in Scala
Scala	Scala is used in Data processing, distributed computing, and web development. written in java
Snowflake	It is also the only cloud data platform that can be used as a data warehouse and a data lake
SDLC stages	SDLC is a systematic process for building softwares 1) Requirement gathering and analysis, 2) Feasibility study, 3) Design, 4) Coding, 5) Testing, 6) Installation/Deployment and 7) Maintenance Bug fixing, upgrade.
SDLC Methods/Model	Waterfall Model, Agile Development Scrum, Incremental, V model, R.A.D (Rapid Application Development Spiral, Big Bang
Agile methodologies / Scrum	1. Extreme Programming (XP), 2. Feature-Driven development (FDD), 3. Adaptive system development (ASD), 4. Dynamic Systems Development Method (DSDM), 5. Lean Software Development (LSD), 6. Kanban, 7. Crystal Clear, 8. Scrum

Java:

Spring = application development framework.
Spring Boot = Spring Boot is a module of Spring Framework, used to develop REST APIs & for building microservices.
REST API = Microservice
Microservices: = collection of smaller independent units
monolithic applications - built as a single unit n independent applications, While a monolithic application is a single unified unit, a microservices architecture breaks it down into a collection of smaller independent units. These units carry out every application process as a separate service.
Swagger= documenting API.

OOPs concepts (Object Oriented Programming): The four basics of OOP are abstraction, encapsulation, inheritance, and polymorphism.

Collection API and Stream API: Java Collections framework is used for storing and manipulating group of data. Stream API is only used for processing group of data, STream API is a part of Java 8.

Version control, also known as source control, is the practice of tracking and managing changes to software code.

Messaging services: (Active MQ, RabbitMQ, or Kafka).

Mobile Apps development:

Android development: Java, Kotlin, GIT, XML, SQL, SDK, API, user interface, Android Studio. Kotlin: Android development application; Android Studio: is the official IDE for Android development. SDK (software development toolkit) is a set of software tools and programs provided by hardware and software vendors that developers can use to build applications for specific platforms.

iOS development: (Swift, Objective-C, XCode, etc.)

DevOps tools:

Containers are a form of operating system virtualization, deploy an application.

Continuous Development: Planning & Coding

The code can be written in any language, but it is maintained by using Version Control tools: Git, SVN, Mercurial, CVS, and JIRA. tools like Ant, Maven, Gradle can be used in this phase for building/ packaging the code into an executable file

Continuous Testing: Selenium, TestNG, JUnit,

Docker Containers can be used for simulating the test environment.

Continuous Integration tool called Jenkins

build this code using ant or maven.

Continuous Deployment:

This is the stage where the code is deployed to the production servers.

Puppet, Chef, SaltStack, and Ansible

Containerization tools also play an equally important role in the deployment stage. Docker and Vagrant

Continuous Monitoring –

The popular tools used for this are Splunk, ELK Stack, Nagios, NewRelic and Sensu. These tools help you monitor the application’s performance and the servers

DevOps tools:

1. Gradle

2. Git

3. Jenkins

4. Bamboo

5. Docker

6. Kubernetes

7. Puppet

8. Ansible

9. Nagios

10. Raygun

Data analyst: Excel, Sql, Tableau, Qliksense, Data visualization, Dashboard, Reporting, R - statistical computing and graphics, data mining, use to clean, analyze, and graph your data, Data mining is the process of understanding data through cleaning raw data, finding patterns, creating models, and testing those models.

Power BI: Data Visulization/Dashboard/Reporting.

Data Engineer: Develop, construct, test and maintain architectures, 1. Cloud infrastructure, (AWS, Azure, Google cloud), 2. Big data technologies ( HDFS,hive, sqoop, pig, Hadoop, spark ), 3. Data warehousing and various job schedulers, 4. Etl tool (informatica, talend), 5. Data streaming framework (Kafka, Spark streaming etc), 6. Python, Java, R, Scala; Pandas and numpy library in Python, 7. Database and sql (Oracle, greenplum, teradata), Data visualization tools(tableau, qlikview, spotfire), Data quality toools or framework, Unix shell scripting, NoSQL db's and ElasticSearch, Various machine learning algorithms used to solve regression, classification, clustering problems.

Skills: Python, Java, R, Scala, Pandas and numpy, Database and sql (Oracle, nosql, teradata), Etl tool (informatica, talend), Big data technologies ( HDFS,hive, sqoop, pig, Hadoop, spark ), Hadoop, Spark, Scala, Kafka, NumPy, Pandas - Python libraries - used in data analysis.

· Etl tool (informatica, talend)

· Database and sql (Oracle, greenplum, teradata)

· Big data technologies ( HDFS,hive, sqoop, pig, Hadoop, spark )

· Cloud infracture (AWS, Azure, Google cloud)

· Python, Java, R, Scala

· Pandas and numpy

· Data visualization tools(tableau, qlikview, spotfire)

· Data warehousing and various job schedulers

· Data quality toools or framework

· Unix shell scripting

· NoSQL db's and ElasticSearch

· Data streaming framework (Kafka, NiFi, Spark streaming etc)

· Various machine learning algorithms used to solve regression, classification, clustering problems

Big Data: Big Data meaning a data that is huge in size, collection of data that is huge in size and yet growing exponentially with time. Big Data analytics examples includes stock exchanges, social media sites, jet engines, etc.

Apache Hadoop is used to efficiently store and process large datasets.

Big Data/Hadoop: Hadoop is an open source, Java based framework used for storing and processing big data.

Kafka: Kafka is primarily used to build real-time streaming data pipelines, real-time analysis as well as to process real-time streams to collect Big Data. streaming analytics, data integration, and mission-critical applications.

Spark: Apache Spark is used for large-scale data processing.

Scala: Scala is used in Data processing, distributed computing, and web development.

Hadoop Ecosystem:

Some of the most well-known tools of the Hadoop ecosystem include HDFS, Hive, Pig, YARN, MapReduce, Spark, HBase, Oozie, Sqoop, Zookeeper.

Data Scientist: Analytics and Modeling, Machine Learning, Data Visualization, Predictive analysis
Skills: Python, SQL, Knime, RapidMiner, SAS, Apache Spark, DataRobot, BigML, Go Spot Check, Mozenda, MATLAB
SAS, Apache Spark

· RapidMiner

· Apache Spark

· MySQL

· DataRobot

· BigML

· Go Spot Check

· Mozenda

· MATLAB

· Paxata

Azure cloud services

Azure Data Lake:

Azure Data Lake includes all the capabilities required to make it easy for developers, data scientists and analysts to store data of any size, shape and speed, and do all types of processing and analytics across platforms and languages.

Store and analyse petabyte-size files and trillions of objects. Azure Data Lake is a scalable data storage and analytics service. The service is hosted in Azure.

Azure Databricks:

Azure Databricks is optimized for Azure and tightly integrated with Azure Data Lake Storage, Azure Data Factory, Azure Machine Learning, Azure Synapse Analytics, Power BI and other Azure services to store all of your data on a simple, open lakehouse and unify all of your analytics and AI workloads.

Azure Databricks is a fully-managed platform service offering by Microsoft Azure, in a nutshell, it is a Big Data and Machine Learning platform.

Databricks provides a unified, open platform for all your data. It empowers data scientists, data engineers, and data analysts with a simple collaborative environment to run interactive, and scheduled data analysis workloads.

Azure data Factory:

Azure Data Factory is Azure's cloud ETL service for scale-out serverless data integration and data transformation.

ADF is generally used for data movement, ETL process, and data orchestration whereas; Databricks helps in data streaming and data collaboration in real-time.

Amazon Web Services:
EC2, RDS, S3, Lambda, Cloud front, EBS, EFS, Athena, Cloudsearch, Elasticsearch, DynamoDB.
EC2 – Compute, Storage service: S3, EBS, Container service: ECS, Storage service: S3, Ec2, smt, iam

Compute Services: EC2

AWS Storage Services: Amazon Simple Storage System (S3), Amazon Elastic Book Store (EBS)

Amazon Elastic File System (EFS)

AWS Container Services: Amazon Elastic Container Service (ECS), Amazon ECS Anywhere, Amazon Elastic Kubernetes Service, Amazon EKS Distro

AWS Analytics Services: Amazon Athena, Amazon Cloudsearch, Amazon Elasticsearch Service, Amazon EMR, Amazon FinSpace, Amazon Kinesis

AWS Database Services: Amazon Aurora, Amazon DynamoDB, Amazon DocumentDB, Amazon ElastiCache, Amazon Keyspaces, Amazon Neptune, Amazon Quantum Ledger Database, Amazon RDS, Amazon RDS on VMware, Amazon Redshift

AWS Security, Identity, and Compliance Services: AWS Identity & Access Management, Amazon Cognito, Amazon Detective, Amazon GuardDuty, Amazon Inspector

AWS Serverless Services: AWS Lambda, Amazon API Gateway, Amazon DynamoDB, Amazon EventBridge, Amazon Simple Notification System (SNS), Amazon Simple Queue Service (SQS)

ETL: Use ETL when you want to physically move data from multiple data sources into a single data warehouse. Talend, Informatica, Data from one or more sources is extracted and then copied to the data warehouse. The process of ETL plays a key role in data integration strategies. It is the process of moving raw data from one or more sources into a destination data warehouse. ETL allows businesses to gather data from multiple sources and consolidate it into a centralized location.

TIBCO: leading data integration solutions: Extract, transform, and load (ETL) and data virtualization.
Mulesoft: data integration, ESB: An Enterprise Service Bus (ESB) is fundamentally an architecture. It is a set of rules and principles for integrating numerous applications together over a bus-like infrastructure.

Programming Languages	C++, Java, J2EE/JEE, SQL/PLSQL
Operating Systems	Windows 98/2000/XP/NT, Unix, MS-DOS, Linux
Java Technologies	J2EE, JSP, Servlets, JDBC, JMS, MDB, JNDI, Web Services, JSF.
Web/App. Server	Apache Tomcat 5.5 &6.x, WebLogic 7.0, 10.0, Web Sphere 6.1, JBoss4.5
Frameworks& Tools	Struts1.1/2.0, JSF, Spring, MVC, ATG, Hibernate, JUnit, JPA, Easy Mock, AJAX, Log4J, Eclipse, STS, Tibco EMS.
Web Technologies	JSP, jQuery, XML, JSON, HTML5, XSLT, JavaScript, CSS, DHTML, Servlets, JSF, Ajax, REST, JSTL
Databases	ORACLE,DB2, Sybase, SQL Server, MYSQL
Design & Modeling	UML, Design Patterns, Microsoft Visio, Rational Rose 3.0,Agile SCRUM
Tools/IDES	RAD 7.5,Net Beans, Eclipse
Build Tools	ANT, Maven
Version Control Tools	CVS, SVN, GIT
JAVA (backend)	JAVA (backend): J2ee, JSF, JSP, Webserices (Rest, SOAP UI), Struts, Spring, hibernate, servlets
JAVA (frontend) UI	JAVA (frontend) UI: JavaScript, HTML, CSS, DHTML, REST API, Angular, Jquery, React js, Bootstrap, Node js,
	Fullstack is combination of Frontend and backend)
.NET	.NET: asp.net, C#, VB.net, Visual Studio, ASP.NET, .net Web Services, ASP.NET, MVC, AJAX, Classic ASP, JavaScript, VBScript, HTML, DHTML, XML, CSS, JQuery, webForms and win Forms,
QA	JBehave, Selenium Suite (Selenium IDE, Selenium WebDriver, Selenium Remote Control, Selenium Grid), HP QTP 8.0, 8.2, 9.0, Segue SilkTest 7.0, Test Complete, Robot Framework, Visual Studio 2010, 2013

Generic Titles & Tehnologies

Business Analyst (BA) & Business Systems Analyst

Operating Systems:

Linux (Alpine, Arch, Debian, Gentoo, Redhat, Ubuntu), Mac OS, UNIX, Windows, AIX, FreeBSD

RDBMS / NoSQL : ORACLE 10g MS-Access, MySQL, SQL-Server, DB2, MongoDB.

Apple/Iphone dev.:

Environments: Xcode, Objective-C, Swift, Core Data, Realm, Storyboards, Bluetooth devices, Trello, Slack

Software Developer: Implementation-Coding

Who involves all phases of the software development (Design / coding / unit testing)

Certification: It will differs based on development skill / tool

Fullstack is combination of Frontend and backend)

Front-end & Back-end:

Front – End: It will interface between the User and Backend… Front-End is responsible for collecting the inputs in various forms from the user. Between H/W and End-User there is a different kind of layers… To make end user as most User friendly it will help. Eg: Java / HTML / ASP / Ruby / Mainframe

Back-End: It’s a Database which can be used by the End-User in-directly through external application. Use to store the database. Eg: SQL, Oracle, PSQL, Sybase, DB2 DB

Front-End Technologies:

HTML/HTML5, CSS/CSS3, JavaScript (NodeJS, AngularJS, ReactJS, VueJS), Jquery, VBScript, AJAX, Twitter BootStrap/BootStrap4, Express.js- framework for Node.js, (ASP.net Active Server Pages.NET)

Back End technologies:

C/C++, C#, PHP, Python, Ruby on Rails(RoR), Java, Sql, Perl

Databases: My SQL, NO Sql, Oracle 12c, PostgreSQL, MongoDB, MariaDB, IBM DB2, SAP HANA

Databases:

A) SQL

B) NoSQL

The main difference between these two is that SQL databases, also called Relational Databases (RDBMS), have relational structure and NoSQL doesn’t use relations. SQL databases are vertically scalable, which means one ultimate machine will do the work for you. On the other hand, NoSQL databases are horizontally scalable, which means multiple smaller machines will do the work for you.

A) SQL (Structured Query Language)/Relational Databases (RDBMS):

MySQL, PostgreSQL, Oracle 12c, Sybase, Ms SQL server (Microsoft SQL Server), Access, Ingres

B) NoSQL Databases:

1. MongoDB

2. Cassandra

3. Redis

4. HBase

5. Neo4j

6. Oracle NoSQL

7. DynamoDB

8. Couchbase

9. Memcached

10. CouchDB

CMS(Content Management System):

Wordpress, Joomla, Drupal, Magento, Blogger, Shopify, Bitrix, Typo3, Squarespace, Prestashop, DotNetDuke; SQL (MySQL or PostgreSQL) and NoSQL (MongoDB or Cassandra)

Framework:

basic structure / Platform, A framework is a collection of programs that do something useful and which you can use to develop your own applications. A framework guides you on how to do something (like a predefined way of doing things)

Full Stack developer:

A full stack developer is a jack-of-all-trades in servers, databases, systems engineering, and facing clients.
both front-end and back-end work.
Front end developer (Client side developer),
Skills/Technologies-
Design of user interface (UI) and user experience (UX), CSS, JavaScript, HTML, and a growing collection of UI frameworks

Back end developer (Server side developer)
Skills/Technologies-
programming languages such as Java, C, C++, Ruby, Perl, Python, Scala, Go, etc. Back-end developers often need to integrate with a vast array of services such as databases, data storage systems, caching systems, logging systems, email systems, etc.

Full Stack developer:-
Design, develop, deploy, troubleshoot, and debug web solutions and back end services :Participate in the full development life cycle, including design, coding, testing and production releases

Database: SQL, PostgreSQL, NoSql, MongoDB, Cassandra

Front-end is also referred to as the “client-side” as opposed to the backend which is basically the “server-side” of the application. The essentials of backend web development include languages such as Java, Ruby, Python, PHP, . Net, etc. The most common frontend languages are HTML, CSS, and JavaScript.

Front-end technologies

HTML5, CSS3, JavaScript, ES6

AngularJS, ReactJS, NodeJS

BackboneJS, JQuery, Vue.js

Ajax, Bootstrap, Webpack, GIT

Back-end Technologies:

The back-end has three parts to it: server, application, and database. In order to handle the back end of given applications, programmers or back end developers have to deal with back end technologies that includes languages like PHP, .NET and Java.

Ruby, PHP, Java, .Net, Python, C++

MongoDB, MySQL, PostgreSQL

Express js

#ReactJS VS React Native

React-Native is a framework which used to create Mobile Apps, where ReactJS is a javascript library you can use for your website. React Native is same as React, but it uses native components instead of using web components as building blocks. While Reactjs is basically a JavaScript library and React Native is the entire framework

#Node. js can be used in both the frontend and backend of applications.

IDE (Integrated Development Environment): An IDE is an application used to write and compile code. A framework is generally a software component that someone else wrote that you can use/integrate into your own project, generally to avoid re-inventing the wheel. A framework is a tool that is closely attached to the language you are using and usually extends upon or adds the the language features.

Java makes use of frameworks like Hibernate, Struts and Spring to extend the language and NetBeans or Intellij IDEA bring support for these tools to your Java project in a structured manor.

Visual Studio is an IDE and .NET is a framework

IDEs, being a development environment, are used to develop software programs from the scratch

NetBeans, Eclipse, IntelliJ.

Library and Framework:

Both Library and Framework are code written by some developer to solve a complicated problem efficiently. Their purpose was to increase the reusability of the code so that you can use the same piece of code or functions again in your various project.

What is Library?

A Library is a set of code that was previously written by a developer that you can call when you are building your project.

In Library, you import or call specific methods that you need for your project.

In simple words, a bunch of code packed together that can be used repeatedly is known as Library.

Reusability is one of the main reasons to use libraries.

Let's undersand this more clearly with the help of an example.

Think of you as a carpenter who needs to build a table.

Now, you can build a table without the help of tools, but it's time-consuming and a long process.

What is Framework?

A framework is a supporting structure that gives shape to your code.

In the Framework, you have to fill the structure accordingly with your code.

There is a specific structure for a particular framework that you have to follow, and it's generally more restrictive than Library.

One thing to remember here is that frameworks sometimes get quite large, so they may also use the Library.

But the Framework doesn't necessarily have to use Library.

Let's get back to our carpenter and table example for a better understanding of the Framework.

Here, if you want to build a table, then you need a model or skeleton for how the table looks, like the table has four legs and a top slab.

Now, this is the core structure of the table and you have to work accordingly to build the table.

Similar to this, Framework also provides the structure, and you have to write the code accordingly.

JavaScript framework:

A JavaScript framework is a collection of pre-written JS code libraries that developers can access and use for creating routine programming functions and features.

The basic use of a JS framework is for building websites and web applications with ease.

Rather than writing the same code from scratch, developers can utilize these code libraries for accessing these programming blocks.

This obviously saves you a lot of time which you can spend on creating other unique elements of your website.

When you’re working with a JavaScript framework, you can simply search for functionality in the JS libraries and directly import its code into your site’s code as required. JavaScript saves your time and energy.

There are a number of popular frameworks with different features and uses, and you should choose the one that fits perfectly for your needs at a given time.

Front-end Javascripts: React, Angular, Vue, Embers.js, Svelte.js, Backbone.Js, Aurelia.Js, Polymer.Js, Mithril.Js, Webix

Top 5 JavaScript Front-End Frameworks: 1. React.js 2. Vue.js 3. Angular.js 4. Ember.js 5. Polymer.js

Back-end Javascripts: Node.JS, Express.JS, Hapi, Feather.js, Meteor.js, Total.js

Node.js frameworks is well known for creating the REST API, desktop applications, proxy servers

Top 5 JavaScript Back-End Frameworks: 1. Express 2. Next.js 3. Meteor 4. Koa 5. Sails

#Node. js can be used in both the frontend and backend of applications.

JS testing Framework: Jest, Mocha JS, Jasmine, Cypress.

Top 5 Java Frameworks: Spring, Apache Struts, Hibernate, Grails, JSF (JavaServer Faces), Wicket, GWT (Google Web Toolkit), Dropwizard, Play, Vaadin, Blade.

Top 5 Java Libraries:

Project Lombok, Guava, jOOQ, Apache Lucene, Mockito, AssertJ.

What is spring boot used for?

Spring Boot is an open source Java-based framework used to create a micro Service.

Spring Boot: Spring Boot is a module of Spring Framework. It allows us to build a stand-alone application with minimal or zero configurations.

Spring vs. Spring Boot.

Spring Spring Boot

Spring Framework is a widely used Java EE framework for building applications. Spring Boot Framework is widely used to develop REST APIs.

API (Application Programming Interface): which is a software intermediary that allows two applications to talk to each other. API stands for application programming interface, which is a set of definitions and protocols for building and integrating application. Each time you use an app like Facebook, send an instant message, or check the weather on your phone, you're using an API. Ex.: Weather Snippets, Pay with PayPal, Twitter bot.

Spring Boot as the backend REST API

What is the difference between Web services and Microservices?

In the simplest of terms, microservices and web services are defined like this: Microservice: A small, autonomous application that performs a specific service for a larger application architecture. Web service: A strategy to make the services of one application available to other applications via a web interface.

What is REST API vs SOAP?

REST APIs uses multiple standards like HTTP, JSON, URL, and XML for data communication and transfer. SOAP APIs is largely based and uses only HTTP and XML. As REST API deploys and uses multiple standards as stated above, so it takes fewer resources and bandwidth as compared to SOAP API.

SOAP = PROTOCOL, REST = ARCHITECTURE

What is the difference between Docker and Microservices?

We will understand the difference between Docker and Microservices by an analogy. Docker is a Cup or in other words Container whereas Microservice is the liquid that you pour into it. You can pour different types of liquids in the same cup. ... Similarly, you can run many Microservices in same Docker container.

HTML to define the content of web pages.

CSS to specify the layout of web pages.

JavaScript to program the behaviour of web pages.

AJAX stands for Asynchronous JavaScript And XML. In a nutshell, it is the use of the XMLHttpRequest object to communicate with servers. It can send and receive information in various formats, including JSON, XML, HTML, and text files.

AJAX allows web pages to be updated asynchronously by exchanging small amounts of data with the server behind the scenes. This means that it is possible to update parts of a web page, without reloading the whole page.

JSON is a text format for storing and transporting data.

XML stands for extensible markup language. A markup language is a set of codes, or tags, that describes the text in a digital document. The most famous markup language is hypertext markup language (HTML), which is used to format Web pages.

SQL is a standard language for storing, manipulating and retrieving data in databases.

SQL stands for Structured Query Language, SQL lets you access and manipulate databases

SQL can insert records in a database, SQL can update records in a database

SQL can delete records from a database, SQL can create new databases.

SQL can create new tables in a database, SQL can create stored procedures in a database

DBMS (Database Management System) is essentially nothing more than a computerized data-keeping system. In DBMS, the data is stored as a file, while in RDBMS, the information is stored in tables. DBMS can only be used by one single user, whereas multiple users can use RDMBS.

Some DBMS examples include MySQL, PostgreSQL, Microsoft Access, SQL Server, FileMaker, Oracle, RDBMS, dBASE, Clipper, and FoxPro.

RDBMS stands for Relational Database Management System.

RDBMS is the basis for SQL, and for all modern database systems such as MS SQL Server, IBM DB2, Oracle, MySQL, and Microsoft Access.

The data in RDBMS is stored in database objects called tables. A table is a collection of related data entries and it consists of columns and rows.

One key feature of RDBMS is that it can only keep the tabular form of data. Data in RDBMS is stored and sorted in the form of rows, columns.

Top 10 Relational Databases: MS SQL, Oracle Database, MySQL, IBM Db2, Amazon Relational Database Service (RDS), PostgreSQL, SAP HANA, Amazon Aurora.MariaDB, Db2 Express-C, SQLite, CUBRID, Firebird, Oracle Database XE, Sequel Pro, PostgreSQL, SQL Server Express.

OLTP & OLAP:

An OLTP system is an accessible data processing system in today's enterprises. Some examples of OLTP systems include order entry, retail sales, and financial transaction systems.

Is SQL OLTP or OLAP?

OLTP and OLAP both are online processing systems. ... OLTP is an online database modifying system, whereas OLAP is an online database query answering system.

Python is commonly used for developing websites and software, task automation, data analysis, and data visualization.

It is used for: web development (server-side), software development,mathematics,system scripting. Use Python for statistical analysis and create data visualizations to see the big picture.

R is a programming language and free software environment for statistical computing and graphics supported by the R Core Team and the R Foundation for Statistical Computing. It is widely used among statisticians and data miners for developing statistical software and data analysis.R is a programming language and software environment for statistical analysis, graphics representation and reporting.

Machine learning is a method of data analysis that automates analytical model building. It is a branch of artificial intelligence based on the idea that systems can learn from data, identify patterns and make decisions with minimal human intervention. Machine learning (ML) is a type of artificial intelligence (AI) that allows software applications to become more accurate at predicting outcomes without being explicitly programmed to do so. Machine learning algorithms use historical data as input to predict new output values.

Big Data : Hadoop, NoSql, Apache Spark, Apache Storm, Cassandra, rapidminer, Mongo DB

Big data is a term for data sets that are so large or complex that traditional data processing application softwareis inadequate to deal with them. Big data challenges include capturing data, data storage, data analysis, search, sharing, transfer, visualization, querying, updating and information privacy.

Hadoop is an open source, Java-based programming framework that supports the processing and storage of extremely large data sets in a distributed computing environment. It is part of the Apache project sponsored by the Apache Software Foundation.

Cloud Computing:

AWS (Amazon Web Services)/ Microsoft Azure/Google Cloud Platform

IaaS (Infrastructure-as-a-Service), PaaS (Platform-as-a-Service), SaaS (Software-as-a-Service)

Azure is a Paas public cloud computing platform—with solutions including Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS) that can be used for services such as analytics, virtual computing, storage, networking, and much more.
Azure DevOps is the evolution of VSTS (Visual Studio Team Services).

Simply put, cloud computing is the delivery of computing services—servers, storage, databases, networking, software, analytics and more—over the Internet (“the cloud”). Companies offering these computing services are called cloud providers and typically charge for cloud computing services based on usage, similar to how you are billed for water or electricity at home.

Sales CRM:

The Salesforce cloud is an on-demand customer relationship management (CRM) suite offering applications for small, midsize and enterprise organizations, with a focus on sales and support.

Amazon Web Services (AWS) is a secure cloud services platform, offering compute power, database storage, content delivery and other functionality to help businesses scale and grow.

Artificial Intelligence (AI):

NLP Natural Language Processing NLP Ai, AI artificial intelligence, ML Machine learning, DL DL is a subset of ML, which is also a subset of AI.

ETL - Extract, Transform, Load

ETL is short for extract, transform, load, three database functions that are combined into one tool to pull data out of one database and place it into another database. Extract is the process of reading data from a database. In this stage, the data is collected, often from multiple and different types of sources.Transform is the process of converting the extracted data from its previous form into the form it needs to be in so that it can be placed into another database. Transformation occurs by using rules or lookup tables or by combining the data with other data.Load is the process of writing the data into the target database.

How ETL Works

Data from one or more sources is extracted and then copied to the data warehouse. When dealing with large volumes of data and multiple source systems, the data is consolidated.

ETL is used to migrate data from one database to another, and is often the specific process required to load data to and from data marts and data warehouses, but is a process that is also used to to large convert (transform) databases from one format or type to another.

ETL collects and redefines data, and delivers them to a data warehouse

The process of ETL plays a key role in data integration strategies. It is the process of moving raw data from one or more sources into a destination data warehouse. ETL allows businesses to gather data from multiple sources and consolidate it into a centralized location. This is essential in making the data analysis-ready in order to have a seamless business intelligence system in place.

Popular cloud data warehouses like AWS DynamoDB, AWS Redshift

1. Informatica PowerCenter

2. IBM InfoSphere DataStage

3. Talend

4. Pentaho

5. AWS Glue

6. Azure Data Factory

Kafka is written in Scala and Java and is often associated with real-time event stream processing for big data.

Kafka is an open source software which provides a framework for storing, reading and analysing streaming data.

Kafka is used to stream data into data lakes, applications, and real-time stream analytics systems.

Kafka is used heavily in the big data space as a reliable way to ingest and move large amounts of data very quickly.

Snowflake is an analytic data warehouse provided as Software-as-a-Service (SaaS). Snowflake provides a data warehouse that is faster, easier to use, and far more flexible than traditional data warehouse offerings.

ETL is a type of data integration that refers to the three steps (extract, transform, load) used to blend data from multiple sources. It's often used to build a data .

ETL, for extract, transform and load, is a data integration process that combines data from multiple data sources into a single, consistent data.

Amazon Web Services (AWS) - cloud computing platforms

AWS Lambda – Serverless Compute - Amazon Web Services

AWS Lambda is an event-driven, serverless computing platform provided by Amazon as a part of Amazon Web Services. It is a computing service that runs code in response to events and automatically manages the computing resources required by that code. It was introduced in November 2014.

expert who makes high-level design choices and tries to enforce technical standards, including software coding standards, tools, and platforms.

Scala Vs Python Vs Spark

Scala is frequently over 10 times faster than Python. ... In case of Python, Spark libraries are called which require a lot of code processing and hence slower performance. In this scenario Scala works well for limited cores. Moreover Scala is native for Hadoop as its based on JVM.

The difference between Spark and Scala is that th Apache Spark is a cluster computing framework, designed for fast Hadoop computation while the Scala is a general-purpose programming language that supports functional and object-oriented programming. Scala is one language that is used to write Spark.

Is Scala faster than PySpark?

However, this not the only reason why Pyspark is a better choice than Scala. There's more. Python API for Spark may be slower on the cluster, but at the end, data scientists can do a lot more with it as compared to Scala. ... This aids in data analysis and also has statistics that are much mature and time-tested.

SAP BW on HANA

SAP Business Warehouse (BW) powered by HANA also known as BW on HANA (BWoH) is SAP's data modeling, warehousing, and reporting tool built on the HANA database. SAP BW on HANA (BWoH) runs on the HANA database and therefore it is simpler and faster.

SAP BW is also a development platform that programmers use to create and modify data warehouses, perform data management tasks, generate reports and develop analytics applications. Business users typically access SAP BW through an application built by a developer, such as an executive Dashboard or mobile app.

SAP BW on HANA and SAP BW/4HANA are different application suites running on the same database. BW on HANA uses SAP's legacy BW software, but moves it to the HANA database, while BW/4HANA uses a re-engineered software suite designed to fully harness the power of the HANA database.

Types of Big Data Technologies

Before starting with the list of technologies let us first see the broad classification of all these technologies.

They can mainly be classified into 4 domains.

· Data storage

· Analytics

· Data mining

· Visualization

Data analysts skills :

Structured Query Language (SQL)

Microsoft Excel

R or Python-Statistical Programming

Data Visualization

Machine Learning

Big Data definition: Big Data meaning a data that is huge in size. Bigdata is a term used to describe a collection of data that is huge in size and yet growing exponentially with time. Big Data analytics examples includes stock exchanges, social media sites, jet engines, etc

TOP 5 BIG DATA TECHNOLOGIES

1. Hadoop Ecosystem

Hadoop Framework was developed to store and process data with a simple programming model in a distributed data processing environment. The data present on different high-speed and low-expense machines can be stored and analyzed. Enterprises have widely adopted Hadoop as Big Data Technologies for their data warehouse needs in the past year. The trend seems to continue and grow in the coming year as well. Companies that have not explored Hadoop so far will most likely see its advantages and applications.

2. Artificial Intelligence

Artificial Intelligence is a broad bandwidth of computer technology that deals with the development of intelligent machines capable of carrying out different tasks typically requiring human intelligence. AI is developing fast from Apple’s Siri to self-driving cars. As an interdisciplinary branch of science, it takes into account a number of approaches such as increased Machine Learning and Deep Learning to make a remarkable shift in most tech industries. AI is revolutionizing the existing Big Data Technologies.

3. NoSQL Database

NoSQL includes a wide variety of different Big Data Technologies in the database, which are developed to design modern applications. It shows a non-SQL or non-relational database providing a method for data acquisition and recovery. They are used in Web and Big Data Analytics in real-time. It stores unstructured data and offers faster performance and flexibility while addressing various data types—for example, MongoDB, Redis and Cassandra. It provides design integrity, easier horizontal scaling and control over opportunities in a range of devices. It uses data structures that are different from those concerning databases by default, which speeds up NoSQL calculations. Facebook, Google, Twitter, and similar companies store user data terabytes daily.

4. R Programming

R is one of the open-source Big Data Technologies and programming languages. The free software is widely used for statistical computing, visualization, unified development environments such as Eclipse and Visual Studio assistance communication. According to experts, it has been the world’s leading language. The system is also widely used by data miners and statisticians to develop statistical software and mainly data analysis.

5. Data Lakes

Data Lakes means a consolidated repository for storage of all data formats at all levels in terms of structural and unstructured data.

Data can be saved during Data accumulation as is without being transformed into structured data. It enables performing numerous types of Data analysis from dashboards and Data visualization to Big Data transformation in real-time for better business interference.

Businesses that use Data Lakes stay ahead in the game from their competitors and carry out new analytics, such as Machine Learning, through new log file sources, data from social media and click-streaming.

This Big Data technology helps enterprises respond to better business growth opportunities by understanding and engaging clients, sustaining productivity, active device maintenance, and familiar decision-making to better business growth opportunities.

4) EMERGING BIG DATA TECHNOLOGIES

1. TensorFlow

TensorFlow has a robust, scalable ecosystem of resources, tools, and libraries for researchers, allowing them to create and deploy powerful Machine Learning applications quickly.

2. Beam

Apache Beam offers a compact API layout to create sophisticated Parallel Data Processing pipelines through various Execution Engines or Runners. Apache Software Foundation developed these tools for Big Data in the year 2016.

3. Docker

Docker is one of the tools for Big Data that makes the development, deployment and running of container applications simpler. Containers help developers stack an application with all of the components they need, such as libraries and other dependencies.

4. Airflow

Apache Airflow is a Process Management and Scheduling System for the management of data pipelines. Airflow utilizes job workflows made up of DAGs (Directed Acyclic Graphs) tasks. The code description of workflows makes it easy to manage, validate and version a large amount of Data.

5. Kubernetes

Kubernetes is one of the open-source tools for Big Data developed by Google for vendor-agnostic cluster and container management. It offers a platform for the automation, deployment, escalation and execution of container systems through host clusters.

6. Blockchain

Blockchain is the Big Data technology that carries a unique data safe feature in the digital Bitcoin currency so that it is not deleted or modified after the fact is written. It’s a highly secured environment and an outstanding option for numerous Big Data applications in various industries like baking, finance, insurance, medical and retail, to name a few.

----

Apache Hadoop. It is the topmost big data tool. ...

Apache Spark. Apache Spark is another popular open-source big data tool designed with the goal to speed up the Hadoop big data processing. ...

MongoDB. ...

Apache Cassandra. ...

Apache Kafka. ...

QlikView. ...

Qlik Sense. ...

Tableau.

Data warehousing:

A data warehouse is constructed by integrating data from multiple heterogeneous sources that support analytical reporting, structured and/or ad hoc queries, and decision making. Data warehousing involves data cleaning, data integration, and data consolidations.

In computing, a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for reporting and data analysis and is considered a core component of business intelligence. DWs are central repositories of integrated data from one or more disparate sources

What is a data warehouse vs database?

What are the differences between a database and a data warehouse? A database is any collection of data organized for storage, accessibility, and retrieval. A data warehouse is a type of database the integrates copies of transaction data from disparate source systems and provisions them for analytical use.

What is data warehouse and how does it work?

A data warehouse contains data from many operational sources. It is used to analyze data. Data warehouses are analytical tools, built to support decision making and reporting for users across many departments. ... Data warehouses work to create a single, unified system of truth for an entire organization.

Data modeling:

Data modeling is the process of creating a visual representation of either a whole information system or parts of it to communicate connections between data points and structures.

Big Data:

Big Data meaning a data that is huge in size. Bigdata is a term used to describe a collection of data that is huge in size and yet growing exponentially with time. Big Data analytics examples includes stock exchanges, social media sites, jet engines, etc.

Apache Hadoop is an open source framework that is used to efficiently store and process large datasets ranging in size from gigabytes to petabytes of data. Instead of using one large computer to store and process the data, Hadoop allows clustering multiple computers to analyze massive datasets in parallel more quickly.

What is Hadoop and Big Data?

Hadoop is an open source, Java based framework used for storing and processing big data. The data is stored on inexpensive commodity servers that run as clusters. ... Cafarella, Hadoop uses the MapReduce programming model for faster storage and retrieval of data from its nodes.

Hadoop is a database framework, which allows users to save, process Big Data.

Kafka:

Kafka can handle huge volumes of data and remains responsive, this makes Kafka the preferred platform when the volume of the data involved is big to huge. ... Kafka can be used for real-time analysis as well as to process real-time streams to collect Big Data.

Apache Kafka is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.

What is Kafka used for?

Kafka is primarily used to build real-time streaming data pipelines and applications that adapt to the data streams. It combines messaging, storage, and stream processing to allow storage and analysis of both historical and real-time data.

What is difference between Kafka and MQ?

While ActiveMQ (like IBM MQ or JMS in general) is used for traditional messaging, Apache Kafka is used as streaming platform (messaging + distributed storage + processing of data). Both are built for different use cases. You can use Kafka for "traditional messaging", but not use MQ for Kafka-specific scenarios.

Kafka is a Message broker. Spark is the open-source platform. Kafka has Producer, Consumer, Topic to work with data. ... So Kafka is used for real-time streaming as Channel or mediator between source and target.

Spark: is a fast and general engine for large-scale data processing.

Apache Spark is an open-source unified analytics engine for large-scale data processing. Spark provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.

Scala:

Scala is used in Data processing, distributed computing, and web development. It powers the data engineering infrastructure of many companies.

Hadoop Ecosystem:

Some of the most well-known tools of the Hadoop ecosystem include HDFS, Hive, Pig, YARN, MapReduce, Spark, HBase, Oozie, Sqoop, Zookeeper.

Databricks provides a unified, open platform for all your data. It empowers data scientists, data engineers, and data analysts with a simple collaborative environment to run interactive, and scheduled data analysis workloads.

Data bricks is an enterprise software company founded by the creators of Apache Spark.The company has also created Delta Lake, MLflow and Koalas, open source projects that span data engineering, data science and machine learning.

Azure Databricks is a data analytics platform optimized for the Microsoft Azure cloud services platform. ... For a big data pipeline, the data (raw or structured) is ingested into Azure through Azure Data Factory in batches, or streamed near real-time using Apache Kafka, Event Hub, or IoT Hub.

Is Databricks a competitor of Snowflake?

Databricks and Snowflake are direct competitors in cloud data warehousing, although both shun that term. Snowflake now calls its product a “data cloud,” while Databricks coined the term “lakehouse” to describe a fusion between free-form data lakes and structured data warehouses.

Daily C2C requirements

Pages

IT Skills

No comments:

Post a Comment

IT Skills