Posts

Showing posts with the label technical

Understanding Wide-Column Stores

To understand the concept  more clearly, let’s start with an example of  Relational database : Customers Id Product FN Date Country 1 Dell Harry 17/04/2018 India 2 Dell Harry 17/04/2018 India 3 Apple Ron 17/04/2018 India 4 Sony Ron 17/04/2018 South Africa 5 Sony Hermione 17/04/2018 South Africa When we convert this to a  Column Store , this is how it looks like: FN LN Product Date Country Id Value Id Value Id Value Id Value Id Value 1-2 Harry 1-2 Potter 1-2 Dell 1-5 17/04/2018 1-3 India 3-4 Ron 3-4 Weasley 3 Apple 4-5 South Africa 5 Hermione 5 Granger 4-5 Sony Now a  Wide Column store  will Group the columns which are accessed frequently together into one , like Id Name 1-2 FN Harry LN Potter 3-4 FN Ron LN Weasley 5 FN Hermione LN Granger Some key...

Relational and Non-relational databases

In old days, when we used to think of 'database', we would only picture tables and columns, primary and foreign keys, complex joins, intersection tables etc. We often use ER diagrams, relationships to design and define such models. These databases are widely used and are stable. They adhere to  ACID  principles, provide  immediate consistency  which are most important when it comes to transaction handling. The most widely used products are  Oracle, Teradata, SAP Hana, IBM db2, MySQL, Amazon Aurora, Maria db (comes by default on linux distributions), postgreSQL . Then why is there a need of non-relational database? The reason is Data itself. Data is now considered most valuable resource, even more than Oil!! No wonder that social networking site like Facebook and twitter actually run the world. They decide the trends, what people read, what they think. All from the power of Data. Anyways that’s a topic for another post. Data from web, genomics data, data ...

People who bought also bought..

Recommendation systems are everywhere. Right from amazon.com , netflix, youtube, super market’s advertising..and where not! When it all started a few years back, I used to think, wow! How does amazon knows which dress I might like next? That was really interesting. Now we all know that it’s the magic of data science. Machine learning algorithms are continuously capturing the data, learning from it and providing the recommendations. Today we are going through it in a bit more detail.   So, how data can predict what the user is gonna like or gonna buy? Based on prior knowledge. And here comes our first Association Rule Learning Algorithms, Apriori. The name tells us, that it’s going to predict something based on prior knowledge. Apriori works on 3 factors. Support, Confidence and Lift. Lets say, out of 100 people, 10 people like the movie ‘fantastic beasts and where to find them’. So, we can predict that the probability is 10%, for any other set of netflix watchers. This ...

Into the Cloud

I started my career with CRM industry in 2007. 10 years have passed since then. And look how everything has changed! It’s not only Siebel CRM to SFDC or only a change in a CRM platform. It’s not only move from waterfall to agile. There is more to it. Siebel CRM was a revolutionary product in itself. And so as others at that time. Oracle, SAP, these companies started to come up with Products, rather than only services. Now there was no need to start coding from scratch to build your CRM or ERP platform. Only customisations to these ready made products were needed. These products were spread across various horizontals and verticals, making them almost ready to use for any industry. No need to write thousands of lines of code! These technologies proved cutting edge technologies in those times. And then Cloud walked in. Companies spent a lot of money, time in managing the infrastructure. With cloud, this burden was taken off IT. Now you have freedom to utilise these efforts in mo...

Data governance and (hence) Metadata Management

Data is a strategic asset for businesses today and it is growing exponentially. Regulations like GDPR, BCBS 239, Basel 3 are making it extremely important for organisations to have full control over their data. Data governance is about availability, usability, integrity and security of data. Is this something new we are learning here? Actually, No! Data governance has been in place since years. Every system has some or other form of data governance in place. Documents and excel sheets are traditional form of data governance. Could be access control mechanism implemented at system level. Operational fields in a databases like created by/ updated by, are also part of this governance. Audit logs provides useful information for governance purposes.   But is this sufficient? We are implementing new systems, new functionalities every day. Is the organisation working in silos where individual departments might have different standards? Is managing metadata centrally in excel she...

The basics of creating a machine learning model

How do we create a machine learning model? Here I am going to talk about the step by step approach while designing a machine learning model. You can use any programming language, R or Python. The steps to create the model will remain the same.   First of all, we need to understand the business problem. That’s the basic, right?! Even if you are not an expert in that business domain, still it is extremely important to understand the data and what problem we are trying to solve by this statistical analysis.   Now, let’s perform basic EDA. What is EDA? It’s Exploratory Data Analysis. Here we will do some basic analysis on data. There are various commands available in R, Python which help us do this analysis. We can also use Tableau like tool for data mining. Remember, Tableau is not only for glossy business reports. It’s a very effective tool for statistical data analysis as well. You can also perform AB tests/ Chi Squared tests to verify the findings from data minin...